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2.  Fundamental  studies  of  optical  interconnects. 
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We  report  on  the  progress  in  both  of  these  areas  in  the  the  two  following  sections. 


Neural  Networks 


The  research  this  year  focused  on  understanding  the  global  as  well  as  local  properties 
of  the  neural  network  model.  Global  properties  are  the  dynamics  of  the  network, 
convergence  properties,  computational  power  and  capacity.  By  local  properties  we 
mean  the  theory  of  threshold  logic  elements,  the  basic  building  blocks  of  the  network. 
Here  we  mention  only  two  main  contributions  this  year.  The  first  relates  to  the  global 
properties  while  the  second  to  the  local  properties  of  the  neural  network  model.  The 


details  of  these  contributions  appear  in  [1]  and  [2], 

~  ^  lS 

In  f  1  ]  we  investigated  the  relation  between  error-correcting  codes  and  neural 
networks.  The  motivation  'Cehlud  this  workvwas  that  a  neural  network  model  can  be 
viewed  as  a  decoder.  The  stable  states  correspond  to  codewords,  the  ’probe^  vector 
corresponds  to  the  received  vector,  and  convergence  to  the  closest  stable  state 
corresponds  to  Maximum  Likelihood  Decoding  (MLD).  ^ We  found  several  natural 
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Dist  j  Special 
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ways  eft* connecting  the  concept  of  error  correcting  codes  with  the  concept  of  neural 
networks.  In  particular,  we  showed  thafVie  MLD  problem  in  a  linear  block  code  is 
equivalent  to  finding  the  global  maximum  of  the  energy  function  of  a  neural  network 
that  can  be  easily  constructed  knowing  the  basis  set  of  the  code.|We  also  have  a  dual 
result:  given  a  linear  block  code,  we  can  easily  construct  a  neural  network  in  which 
every  local  energy  maximum  corresponds  to  a  codeword  and  every  codeword 
corresponds  to  a  local  maximum,  thus  solving  the  'programming'  problem  for  linear 
codes.  The  results  are  generalized  for  both  nonbinary  and  nonlinear  codes. 

In  [2]  we  answered  a  fundamental  question  in  the  theory  of  threshold  logic.  Suppose 
that  instead  of  using  a  linear  threshold  (LT)  element  we  use  a  polynomial  threshold 
(PT)  element.  A  PT  element  computes  a  polynomial  (instead  of  a  linear  form)  with 
the  restriction  that  the  number  of  terms  in  the  polynomial  is  'small'  (the  number  of 
terms  is  bounded  by  some  polynomial  in  the  number  of  variables).  The  question  is: 
what  is  the  power  of  a  PT  element  and  how  does  it  compare  with  that  of  a  LT  element? 
The  answer  is  that  we  do  not  gain  much  by  using  PT's  instead  of  LT’s.  A  'small'  two 
layer  network  of  LT's  can  do  strictly  more  that  a  single  PT  element  can  do.  In  order  to 
answer  this  question  we  developed  a  novel  technique  based  on  harmonic  analysis  and 
derived  bounds  on  the  number  of  terms  in  the  PT  representation.  As  a  byproduct  we 
also  found  a  new  way  of  deriving  counting  results. 

The  above  results  are  important  to  both  the  theory  of  neural  networks  and  the  area  of 
circuit  complexity  in  the  theory  of  computer  science. 

References 

[1]  J.  Bruck  and  M.  Blaum,  "Neural  Networks,  Error-Correcting  Codes  and 
Polynomials  over  the  Binary  n-Cube",  accepted  for  publication  in  the  IEEE 
Transactions  on  Information  Theory. 

[2]  J.  Bruck,  "Harmonic  Analysis  of  Polynomial  Threshold  Functions",  submitted  to 
SIAM  Journal  on  Discrete  Mathematics. 

Optical  Interconnections 

Our  work  on  fundamental  properties  of  optical  interconnections  contained  two  different 
subtasks.  First,  we  have  nearly  competed  a  study  of  the  comparison  of  coaxial  cable 
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with  optical  interconnects  for  use  within  multiprocessor  machines.  The  analysis  is  a 
rigorous  one,  and  considers  many  different  cases  regarding  termination,  source 
impedance,  etc.  When  complete,  it  will  provide  guidelines  for  when  it  is  appropriate  to 
use  optics  for  this  type  of  interconnection  and  when  it  may  not  be. 

A  second  task  has  been  the  examination  of  a  recent  paper  by  S.H.  Lee  and  his  group 
in  which  projections  of  the  advantages  of  optics  at  the  intra-chip  level  of  interconnects 
were  made.  These  projections  were  far  more  optimistic  than  our  own  previous 
projections,  and  it  was  important  to  discover  the  difference  between  the  two  analyses 
and  the  reasons  for  their  different  predictions.  One  difference  lies  in  the  fact  that  our 
considerations  were  fundamental  ones  while  theirs  were  more  practical.  However, 
this  difference  did  not  explain  the  difference  of  the  predictions.  We  discovered  that, 
while  our  own  predictions  had  projected  the  capabilities  of  both  optics  and  electronics 
to  the  future,  the  San  Diego  group  had  projected  only  the  capabilities  of  optics  to  the 
future.  Characteristics  assumed  of  electronics  were  not  similarly  projected  (for 
example,  3  |im  technology  was  assumed). 

Future  Activities 

Our  activities  in  the  coming  year  will  consist  of  the  following  : 

1.  Continuation  of  our  fundamental  theoretical  studies  of  neural  networks; 

2.  A  beginning  of  a  study  of  interconnect  limits  in  GaAs  technology  (all  of  our 
previous  studies  have  been  for  silicon). 
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On  the  Power  of  Neural  Networks  for 
Solving  Hard  Problems  * 

Jehoshua  Bruck 
Joseph  W.  Goodman 
Information  Systems  Laboratory 
Department  of  Electrical  Engineering 
Stanford  University 
Stanford,  CA  94305 


Abstract 

This  paper  deals  with  a  neural  network  model  in  which  each  neuron 
performs  a  threshold  logic  function.  An  important  property  of  the  model 
is  that  it  always  converges  to  a  stable  state  when  operating  in  a  serial 
mode  [2,5].  This  property  is  the  basis  of  the  potential  applications  of  the 
model  such  as  associative  memory  devices  and  combinatorial  optimization 
[3,6]. 

One  of  the  motivations  for  use  of  the  model  for  solving  hard  combinatorial 
problems  is  the  fact  that  it  can  be  implemented  by  optical  devices  and 
thus  operate  at  a  higher  speed  than  conventional  electronics. 

The  main  theme  in  this  work  is  to  investigate  the  power  of  the  model  for 
solving  NP-hard  problems  [4,8],  and  to  understand  the  relation  between 
speed  of  operation  and  the  size  of  a  neural  network.  In  particular,  it  will 
be  shown  that  for  any  NP-hard  problem  the  existence  of  a  polynomial 
size  network  that  solves  it  implies  that  NP=co-NP.  Also,  for  Traveling 
Salesman  Problem  (TSP),  even  a  polynomial  size  network  that  gets  an 
e-approximate  solution  does  not  exist  unless  P=NP. 

The  above  results  are  of  great  practical  interest,  because  right  now  it  is 
possible  to  build  neural  networks  which  will  operate  fast  but  are  limited 
in  the  number  of  neurons. 

'Presented  at  the  IEEE  Neural  Information  Processing  Systems  Conference,  Denver 
orado,  November  1987. 
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1  Background 

The  neural  network  model  is  a  discrete  time  system  that  can  be  represented  by 
a  weighted  and  undirected  graph.  There  is  a  weight  attached  to  each  edge  of 
the  graph  and  a  threshold  value  attached  to  each  node  (neuron)  of  the  graph. 
The  order  of  the  network  is  the  number  of  nodes  in  the  corresponding  graph. 
Let  N  be  a  neural  network  of  order  n;  then  N  is  uniquely  defined  by  ( W,T ) 
where: 


•  W  is  an  n  x  n  symmetric  matrix,  W,-,-  is  equal  to  the  weight  attached  to 
edge  (t,;). 

•  T  is  a  vector  of  dimension  n,  T{  denotes  the  threshold  attached  to  node  i. 


Every  node  (neuron)  can  be  in  one  of  two  possible  states,  either  1  or  -1.  The 
state  of  node  i  at  time  t  is  denoted  by  VJ(<).  The  state  of  the  neural  network  at 
time  t  is  the  vector  V(t). 

The  next  state  of  a  node  is  computed  by: 


K((  +  1)  =  sgn(H,(t))  =  | 


if  Hi(t)  >  0 
otherwise 


(1) 


where 

m)  =  E  "WiW  -  T< 

>=i 

The  next  state  of  the  network,  i.e.  V(t  +  1),  is  computed  from  the  current 
state  by  performing  the  evaluation  (1)  at  a  subset  of  the  nodes  of  the  network, 
to  be  denoted  by  S.  The  modes  of  operation  are  determined  by  the  method 
by  which  the  set  5  is  selected  in  each  time  interval.  If  the  computation  is 
performed  at  a  single  node  in  any  time  interval,  i.e.  |  S  |=  1,  then  we  will  say 
that  the  network  is  operating  in  a  serial  mode;  if  |  S  |=  n  then  we  will  say  that 
that  the  network  is  operating  in  a  fully  parallel  mode.  All  the  other  cases,  i.e. 
1  <|  S  |<  n  will  be  called  parallel  modes  of  operation.  The  set  S  can  be  chosen 
at  random  or  according  to  some  deterministic  rule. 

A  state  V(t)  is  called  stable  iff  V(t)  =  S(/n(J'V’V’(t)  —  T),  i.e.  there  is  no 
change  in  the  state  of  the  network  no  matter  what  the  mode  of  operation  is. 
One  of  the  most  important  properties  of  the  model  is  the  fact  that  it  always 
converges  to  a  stable  state  while  operating  in  a  serial  mode.  The  main  idea  in 
the  proof  of  the  convergence  property  is  to  define  a  so  called  energy  function 
and  to  show  that  this  energy  function  is  nondecreasing  when  the  state  of  the 
network  changes.  The  energy  function  is: 


E{t)  =  VT(t)WV(t)  -  2 VT(t)T  (2) 
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An  important  note  is  that  originally  the  energy  function  was  defined  such  that 
it  is  nonincreasing  [5];  we  changed  it  such  that  it  will  comply  with  some  known 
graph  problems  (e.g.  Min  Cut). 

A  neural  network  will  always  get  to  a  stable  state  which  corresponds  to  a 
local  maximum  in  the  energy  function.  This  suggests  the  use  of  the  network  as  a 
device  for  performing  a  local  search  algorithm  for  finding  a  maximal  value  of  the 
energy  function  [6J.  Thus,  the  network  will  perform  a  local  search  by  operating 
in  a  random  and  serial  mode  It  is  also  known  [2,9]  that  maximization  of  E 
associated  with  a  given  network  N  in  which  T  =  0  is  equivalent  to  finding 
the  Minimum  Cut  in  N.  Actually,  many  hard  problems  can  be  formulated  as 
maximization  of  a  quadratic  form  (e.g.  TSP  [6])  and  thus  can  be  mapped  to  a 
neural  network. 

2  The  Main  Results 

The  set  of  stable  states  is  the  set  of  possible  final  solutions  that  one  will  get 
using  the  above  approach.  These  final  solutions  correspond  to  local  maxima  of 
the  energy  function  but  do  not  necessarily  correspond  to  global  optima  of  the 
corresponding  problem.  The  main  question  is:  suppose  we  allow  the  network  to 
operate  for  a  very  long  time  until  it  converges;  can  we  do  better  than  just  getting 
some  local  optimum?  i.e.,  is  it  possible  to  design  a  network  which  will  always 
find  the  exact  solution  (or  some  guaranteed  approximation)  of  the  problem? 

Definition:  Let  X  be  an  instance  of  problem.  Then  |  X  |  denotes  the  size  of 
X ,  that  is,  the  number  of  bits  required  to  represent  X.  For  example,  for  X 
being  an  instance  of  TSP,  |  X  |  is  the  number  of  bits  needed  to  represent  the 
matrix  of  the  distances  between  cities. 

Definition:  Let  N  be  a  neural  network.  Then  |  N  |  denotes  the  size  of  the 
network  N.  Namely,  the  number  of  bits  needed  to  represent  W  and  T. 

Let  us  start  by  defining  the  desired  setup  for  using  the  neural  network  as  a 
model  for  solving  hard  problems. 

Consider  an  optimization  problem  L,  we  would  like  to  have  for  every  instance 
X  of  L  a  neural  network  Nx  with  the  following  properties: 

•  Every  local  maximum  of  the  energy  function  associated  with  Nx  corre¬ 
sponds  to  a  global  optimum  of  X. 

•  The  network  Nx  is  small,  that  is,  j  Nx  |  is  bounded  by  some  polynomial 
in  |  AT  |. 

Moreover,  we  would  like  to  have  an  algorithm,  to  be  denoted  by  AL,  which  given 
an  instance  X  £  L,  generates  the  description  for  Nx  in  polynomial  (in  j  .V  |) 
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time. 


Now,  we  will  define  the  desired  setup  for  using  the  neural  network  as  a  model 
for  finding  approximate  solutions  for  hard  problems. 

Definition:  Let  Egi0  be  the  global  maximum  of  the  energy  function.  Let  Eioc 
be  a  local  maximum  of  the  energy  function.  We  will  say  that  a  local  maximum 
is  an  e-approximate  of  the  global  iff: 

Eglo  Eloc 

.  .  <  f 

Egio  ~ 

The  setup  for  finding  approximate  solutions  is  similar  to  the  one  for  finding 
exact  solutions.  For  t  >  0  being  some  fixed  number.  We  would  like  to  have  a 
network  Nx,  in  which  every  local  maximum  is  an  e-approximate  of  the  global 
and  that  the  global  corresponds  to  an  optimum  of  X.  The  network  Nx,  should 
be  small,  namely,  |  Nx,  |  should  be  bounded  by  a  polynomial  in  |  X  |.  Also, 
we  would  like  to  have  an  algorithm  At,,  such  that,  given  an  instance  X  6  L.  it 
generates  the  description  for  Nx,  in  polynomial  (in  |  X  |)  time. 

Note  that  in  both  the  exact  case  and  the  approximate  case  we  do  not  put  any 
restriction  on  the  time  it  takes  the  network  to  converge  to  a  solution  (it  can  be 
exponential). 

At  this  point  the  reader  should  convince  himself  that  the  above  description  is 
what  he  imagined  as  the  setup  for  using  the  neural  network  model  for  solving 
b«ud  problems,  because  that  is  what  the  following  definition  is  about. 

Definition:  We  will  say  that  a  neural  network  for  solving  (or  finding  an  e- 
approximation  of)  a  problem  L  exists  if  the  algorithm  At  (or  Ac,)  which  gen¬ 
erates  the  description  of  Nx  (or  Nx,)  exists. 


The  main  results  in  the  paper  are  summarized  by  the  following  two  propo¬ 
sitions.  The  first  one  deals  with  exact  solutions  of  NP-hard  problems  while  the 
second  deals  with  approximate  solutions  to  TSP. 

Proposition  1  Let  L  be  an  NP-hard  problem.  Then  the  existence  of  a  neural 
network  for  solving  L  implies  that  NP  =  co-NP. 

Proposition  2  Let  e  >  0  be  some  fixed  number.  The  existence  of  a  neural 
network  for  finding  an  e-approximate  solution  to  TSP  implies  that  P=NP. 

Both  (P=NP)  and  (NP=co-NP)  are  believed  to  be  false  statements,  her. re, 
we  can  not  use  the  model  in  the  way  we  imagine. 
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The  key  observation  for  proving  the  above  propositions  is  the  fact  that  a 
single  iteration  in  a  neural  network  takes  time  which  is  bounded  by  a  polynomial 
in  the  size  of  the  instance  of  the  corresponding  problem.  The  proofs  of  the  above 
two  propositions  follow  directly  from  known  results  in  complexity  theory  and 
should  not  be  considered  as  new  results  in  complexity  theory. 

3  The  Proofs 

Proof  of  Proposition  1:  The  proof  follows  from  the  definition  of  the  classes 
NP  and  co-NP,  and  Lemma  1.  The  definitions  and  the  lemma  appear  in  Chap¬ 
ters  15  and  16  in  [8]  and  also  in  Chapters  2  and  7  in  [4]. 

Lemma  1  If  the  complement  of  an  NP-complete  problem  is  in  NP, 
then  NP=co-NP. 

Let  L  be  an  NP-hard  problem.  Suppose  there  exists  a  neural  network  that  solves 
L.  Let  L  be  an  NP-complete  problem.  By  definition,  L  can  be  polynomialy 
reduced  to  L.  Thus,  for  every  instance  X  €  L,  we  have  a  neural  network  such 
that  from  any  of  its  global  maxima  we  can  efficiently  recognize  whether  X  is  a 
’yes’  or  a  ’no’  instance  of  L. 

We  claim  that  we  have  a  nondeterministic  polynomial  time  algorithm  to  decide 
that  a  given  instance  X  6  L  is  a  ’no’  instance.  Here  is  how  we  do  it:  for  X  6  L 
we  construct  the  neural  network  that  solves  it  by  using  the  reduction  to  L.  We 
then  check  every  state  of  the  network  to  see  if  it  is  a  local  maximum  (that  is 
done  in  polynomial  time).  In  case  it  is  a  local  maximum,  we  check  if  the  instance 
is  a  ’yes’  or  a  ’no’  instance  (this  is  also  done  in  polynomial  time). 

Thus,  we  have  a  nondeterministic  polynomial  time  algorithm  to  recognize  any 
’no’  instance  of  L.  Thus,  the  complement  of  the  problem  L  is  in  NP.  But  L  is 
an  NP-complete  problem,  hence,  from  Lemma  1  it  follows  that  NP=co-NP.  □ 


Proof  of  Proposition  2:  The  result  is  a  corollary  of  the  results  in  [7],  the 
reader  can  refer  to  it  for  a  more  complete  presentation. 

The  proof  uses  the  fact  that  the  Restricted  Hamiltonian  Circuit  (RHC)  is  an 
NP-complete  problem. 

Deflniton  of  RHC:  Given  a  graph  G  =  (V,  E)  and  a  Hamiltonian  path  in  G. 
The  question  is  whether  there  is  a  Hamiltonian  circuit  in  G? 

It  is  proven  in  [7]  that  RIIC  is  NP-complete. 

Suppose  there  exists  a  polynomial  size  neural  network  for  finding  an 
e-approximatc  solution  to  TSP.  Then  it  can  be  shown  that  an  instance  .V  £ 


RHC  can  be  reduced  to  an  instance  X  6  TSP,  such  that  in  the  network  jV^ 
the  following  holds:  if  the  Hamiltonian  path  that  is  given  in^Y  corresponds  to  a 
local  maximum  in  N%f  then  X  is  a  ’no’  instance;  else,  if  it  does  not  correspond 
to  a  local  maximum  in  N%  then  X  is  a  'yes’  instance.  Note  that  we  can  check 
for  locality  in  polynomial  time. 

Hence,  the  existence  of  N for  all  X  6  TSP  implies  that  we  have  a  polynomial 
time  algorithm  for  RHC.  □ 

4  Concluding  Remarks 

1.  In  Proposition  1  we  let  |  W  |  and  |  T  |  be  arbitrary  but  bounded  by  a 
polynomial  in  the  size  of  a  given  instance  of  a  problem.  If  we  assume 
that  |  W  j  and  |  T  j  are  fixed  for  all  instances  tb".  a  similar  result  to 
Proposition  1  can  be  proved  without  using  complexity  theory;  this  result 
appears  in  [1]. 

2.  The  network  which  corresponds  to  TSP,  as  suggested  in  [6],  can  not  solve 
the  TSP  with  guaranteed  quality.  However,  one  should  note  that  all  the 
analysis  in  this  paper  is  a  'vorst  case  type  of  analysis.  So.  it  might  be  that 
there  exist  networks  .hat  have  good  behavior  on  the  average. 

3.  Proposition  1  is  general  to  all  NP-hard  problems  while  Proposition  2  is 
specific  TSP  Both  propositions  hold  for  any  type  of  networks  in  which 
an  iteration  takes  polynomial  time. 

4.  Clearly,  every  network  has  an  algorithm  which  is  equivalent  to  it.  but  an 
algoruhm  does  not  necessarily  have  a  corresponding  network.  Thus,  if  we 
do  not  know  of  an  algorithmic  solution  to  a  problem  we  also  will  not  be  able 
to  find  a  network  which  solves  the  problem.  If  one  believes  that  the  neural 
network  model  is  a  good  model  (e.g.  it  is  amenable  to  implementation  with 
optics),  one  should  develop  techniques  to  program  the  network  to  perform 
an  algorithm  that  is  known  to  have  some  guaranteed  good  behavior. 
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Abstract 

We  present  several  ways  of  connecting  the  concept  of  error-correcting  codes  with  the 
concept  of  neural  networks.  We  show  that  performing  maximum  likelihood  decoding 
in  a  linear  block  error-correcting  code  is  equivalent  to  finding  a  global  maximum  of 
the  energy  function  of  a  certain  neural  network.  We  also  show  that  given  a  linear 
block  code  we  can  construct  a  neural  network  such  that  every  local  maximum  of  the 
energy  function  corresponds  to  a  codeword  and  every  codeword  corresponds  to  a  local 
maximum.  We  derive  a  representation  theory  for  boolean  functions  and  use  it  to 
extend  the  results  for  nonlinear  block  codes.  The  connection  between  maximization 
of  polynomials  over  the  n-cube  and  error-correcting  codes  is  also  investigated;  our 
results  suggest  that  decoding  techniques  can  be  a  useful  tool  for  solving  problems  of 
maximization  of  polynomiads  over  the  n-cube. 
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1  Introduction 


The  main  goal  in  the  paper  is  to  explore  the  connections  between  the  three  concepts  in  the 
title. 

A  neural  network  is  a  computational  model  that  has  recently  been  attracting  a  lot  of  interest 
because  it  seems  to  have  properties  that  are  similar  to  those  of  both  biological  and  physical 
systems.  The  computation  that  is  performed  in  a  neural  network  is  a  maximization  of  a  so 
called  energy  function.  The  state  space  of  neural  network  can  be  described  by  the  topography 
which  is  defined  by  the  energy  function  associated  with  the  network. 

The  main  problem  in  the  field  of  error-correcting  codes  is  to  design  good  codes:  codes  that 
can  correct  many  errors  and  whose  encoding  and  decoding  procedures  are  computationally 
efficient.  An  error-correcting  code  can  be  described  by  a  topography,  with  the  peaks  of 
the  topography  being  the  codewords.  The  decoding  of  a  corrupted  word,  (a  point  in  the 
topography  which  is  not  a  peak)  is  then  equivalent  to  looking  for  the  closest  peak  in  the 
topography. 

The  above  analogy  between  the  two  subjects  was  the  initial  motivation  for  this  work. 

It  turns  out  that  both  neural  networks  and  error  correcting  codes  can  be  described  by  poly¬ 
nomials  over  the  n-cube.  Thus,  the  connection  between  the  two  concepts  can  be  established. 
The  representation  of  error  correcting  codes  using  polynomials  over  the  n-cube  gives  also  a 
new  perspective  of  the  subject  that  enables  to  derive  some  new  proofs  for  known  results. 
The  problem  of  maximization  of  polynomials  over  the  n-cube  is  a  known  problem  in  opera¬ 
tions  research  and  computer  science.  The  connection  with  error  correcting  codes  suggests  a 
new  tool  for  solving  these  problems,  namely,  decoding  techniques. 

The  paper  is  organized  as  follows:  In  Section  2,  we  present  some  background  on  neural 
networks.  We  review  the  basic  definitions  of  the  Hopfield  model.  We  discuss  stable  states 
and  the  different  modes  of  operation  of  the  network.  We  conclude  the  section  by  proving 
that  finding  a  global  maximum  of  the  energy  function  of  the  network  is  equivalent  to  inding 
a  minimum  cut  in  a  certain  graph.  The  generalization  to  energy  functions  of  higher  degree 
is  also  reviewed. 

In  Section  3,  we  establish  a  connection  between  the  Hopfield  model  and  graph  theo  etic 
codes.  We  prove  that  maximum  likelihood  decoding  in  a  graph  theoretic  code  is  equiva.ent 
to  finding  the  minimum  cut  in  a  certain  graph.  By  the  previous  section,  this  implies  that 
maximum  likelihood  decoding  in  a  graph  theoretic  code  is  equivalent  to  finding  a  maximum 
of  the  energy  in  a  neural  network. 

In  Section  4,  we  extend  the  results  of  Section  3  to  general  linear  block  codes.  The  key  idea 
is  to  represent  the  binary  symbols  {0, 1}  by  the  symbols  {1,-1}  with  the  operation  being 
multiplication  instead  of  exclusive  OR.  A  general  energy  function,  not  necessarily  quadratic, 
is  defined  based  on  the  generator  matrix  of  a  given  linear  block  code.  We  show  that  finding 
the  global  maximum  of  this  energy  function  is  equivalent  to  maximum  likelihood  decoding 
in  the  code.  Some  of  the  results  are  generalized  for  finite  fields  GF(p),  p  a  prime.  The 
idea  is  to  represent  the  elements  as  p-roots  of  unity.  For  the  cases  p  =  3  and  p  =  5,  the 
energy  function  is  generalized.  For  p  =  3,  maximizing  the  energy  function  is  equivalent  to 


2 


i 


mAYimiiTn  likelihood  decoding.  The  same  is  true  for  p  =  5,  but  with  respect  to  the  Lee 
distance.  Several  examples  with  Hamming  and  first  order  Reed-Muller  codes  are  given. 

In  Section  5  we  study  the  energy  function  associated  with  the  parity  check  matrix  of  a  code. 
When  this  matrix  is  written  in  systematic  form,  we  show  that  each  codeword  corresponds  to 
a  local  maximum  of  the  polynomial  associated  with  the  parity  check  matrix,  and  that  each 
local  maximum  corresponds  to  a  codeword.  We  interpret  the  results  of  this  section  as  dual 
of  the  ones  in  section  4  for  defining  the  maximum  likelihood  problem. 

In  Section  6,  several  ways  of  representing  boolean  functions  are  discussed.  A  boolean  function 
is  defined  as  a  mapping  /  :  {0, 1}"  — ►  {0, 1}.  Given  the  results  of  the  previous  sections,  we 
are  interested  in  representing  it  with  the  symbols  1  and  -1.  We  show  how  to  transform 
a  boolean  function  to  an  equivalent  polynomial  over  {1,-1}  and  its  inverse  transform;  a 
boolean  function  to  an  equivalent  polynomial  over  {0,1}  and  its  inverse  transform;  and  a 
polynomial  over  {1,-1}  to  a  polynomial  over  {0,1}  and  its  inverse  transform.  The  results 
are  used  to  generalize  the  results  in  Section  4  to  nonlinear  codes. 

In  Section  7,  we  consider  the  problem  of  solving  unconstrained  nonlinear  0-1  programs.  This 
is  basically  the  problem  of  maximizing  a  polynomial  on  n  variables,  each  variable  being  0 
or  1.  It  is  known  that  this  problem  is  NP-hard.  The  known  solvable  cases  use  the  concept 
of  the  conflict  graph.  We  found  that  the  family  of  polynomials  associated  with  Hamming 
codes  results  in  a  conflict  graph  which  is  not  bipartite  in  general  (i.e.  for  which  an  efficient 
algorithm  is  not  known).  For  the  family  of  polynomials  associated  with  Hamming  codes 
efficient  recognition  and  maximization  techniques  (which  are  based  on  decoding  techniques) 
are  presented. 

A  note  regarding  the  notation.  Since  G  denotes  a  graph  (in  graph  theory)  and  a  generator 
matrix  (in  coding  theory),  we  decided  to  put  ’hats’  on  all  notations  which  are  related  to 
graphs.  That  is,  a  graph  is  denoted  by  G  =  (V,  E),  while  a  generator  matrix  of  a  code  is 
denoted  by  G. 

2  Background  on  Neural  Networks 

The  neural  network  model  is  a  discrete  time  system  that  can  be  represented  by  a  weighted 
and  undirected  graph.  There  is  a  weight  attached  to  each  edge  of  the  graph  and  a  threshold 
value  attached  to  each  node  (neuron)  of  the  graph.  The  order  of  the  network  is  the  number 
of  nodes  in  the  corresponding  graph.  Let  A  be  a  neural  network  of  order  n:  then  N  is 
uniquely  defined  by  (W,  T)  where: 

•  W  is  an  n  x  n  symmetric  matrix,  is  equal  to  the  weight  attached  to  edge  (t,j). 

•  T  is  a  vector  of  dimension  n,  T,  denotes  the  threshold  attached  to  node  i. 

Every  node  (neuron)  can  be  in  one  of  two  possible  states,  either  1  or  -1.  The  state  of  node  i 
at  time  t  is  denoted  by  Vi(t).  The  state  of  the  neural  network  at  time  t  is  the  vector  V{t). 
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The  next  state  of  a  node  is  computed  by: 


V;(t  +  l)  =  sPn(/f,(<))  =  | 


if  Hi(t)  >  0 
otherwise 


(1) 


where 

i- 1 

The  next  state  of  the  network,  i.e.  V(t +1),  is  computed  from  the  current  state  by  performing 
the  evaluation  (1)  at  a  subset  of  the  nodes  of  the  network,  to  be  denoted  by  S.  The  modes  of 
operation  are  determined  by  the  method  by  which  the  set  5  is  selected  in  each  time  interval. 
If  the  computation  is  performed  at  a  single  node  in  any  time  interval,  i.e.  j  S  |=  1,  then  we 
will  say  that  the  network  is  operating  in  a  serial  mode,  and  if  |  S  |=  n  then  we  will  say  that 
that  the  network  is  operating  in  a  fully  parallel  mode.  All  the  other  cases,  i.e.  1  <|  S  |<  n 
will  be  called  parallel  modes  of  operation.  The  set  S  can  be  chosen  at  random  or  according 
to  some  deterministic  rule. 

A  state  V(t)  is  called  stable  iff  V(t)  =  sgn(WV (t)  —  T),  i.e.  there  is  no  change  in  the  state  of 
the  network  no  matter  what  the  mode  of  operation  is.  One  of  the  most  important  properties 
of  the  model  is  its  convergence  property  as  summarized  by  the  following  proposition. 


Proposition  2.1  [5,7,13]  Let  N  =  ( W,T )  be  a  neural  network,  with  W  being  a  symmetric 
matrix  Then  the  network  N  always  converges  to  a  stable  state  while  operating  in  a  serial 
mode,  and  to  a  cycle  of  length  at  most  2  while  operating  in  a  fully  parallel  mode. 


The  main  idea  in  the  proof  of  the  convergence  property  is  to  define  a  so  called  energy 
function  and  to  show  that  this  energy  function  is  nondecreasing  when  the  state  of  the 
network  changes  as  a  result  of  computation.  The  energy  function  used  in  the  proof  of 
Proposition  2.1  is: 

E(t )  =  VT(t)WV{t)  -  2 VT(t)T  (2) 

A  neural  network  when  operating  in  a  serial  mode  will  always  get  to  a  stable  state  which 
corresponds  to  a  local  maximum  in  the  energy  function.  This  suggests  the  use  of  the  network 
as  a  device  for  performing  a  local  search  algorithm  for  finding  a  maximal  value  of  the  energy 
function  [4,5,14].  Clearly,  every  optimization  problem  which  can  be  defined  in  a  form  of  a 
quadratic  function  over  {  —  1, 1}"  as  in  (2),  can  be  mapped  to  a  neural  network  which  will 
perform  a  search  for  its  optimum.  One  of  the  optimization  problems  which  is  not  only  rep¬ 
resentable  by  a  quadratic  function  but  actually  is  equivalent  to  it  is  the  problem  of  finding 
the  Minimum  Cut  in  a  graph  [5,18].  In  order  to  make  the  above  statement  clear  let  us  start 
by  defining  the  term  cut  in  a  graph. 


Definition:  Let  G  =  (K,  E)  be  a  weighted  and  undirected  graph,  with  W  being  a  symmetric 
matrix  of  the  weights  of  the  edges  of  G.  Let  Vx  be  a  subset  of  V,  and  let  VL,  =  V  —  Vx.  The 
set  of  edges  each  of  which  is  incident  at  a  node  in  Vx  and  at  a  node  in  V_x  is  called  a  cut  in 
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Definition:  The  weight  of  a  cut  is  the  sum  of  its  edge  weights.  A  Minimum  Cut  (MC)  of  a 
graph  is  a  cut  with  minimum  weight. 

The  equivalence  between  the  MC  problem  and  the  problem  of  maximizing  the  energy  func¬ 
tion  of  a  neural  network  is  summarized  by  the  following  theorem  (generalizations  of  this 
equivalence  can  be  found  in  [4,5]).  We  include  the  proof  in  order  to  exhibit  a  principle  that 
will  be  useful  in  the  sequel. 

Proposition  2.2  [5,18]  Let  N  =  ( W,T )  be  a  neural  network  with  all  thresholds  being  zero; 
i.e,  T  =  0.  The  problem  of  finding  a  state  V  for  which  the  energy  E  is  maximum  is  equivalent 
to  finding  a  minimum  cut  in  the  graph  corresponding  to  N . 

Proof:  Since  T  =  0,  the  energy  function  is: 

E  =  £,i;w,.iv,v,  (3) 

*=i  j=  i 

Let  W++  denote  the  sum  of  weights  of  edges  in  N  with  both  end  points  equal  1,  and  let 
W+~  denote  the  corresponding  sums  of  the  other  two  cases.  It  follows  that: 

E  =  2 (W++  +  W—  -  W+-)  (4) 

which  also  can  be  written  as: 

E  =  2 (W++  +  W—  +  W+-)  -  4W+~  (5) 

Since  the  first  term  in  the  above  equation  is  constant  (it  is  the  sum  of  the  weights  of  the 
edges),  it  follows  that  maximization  of  E  is  equivalent  to  the  minimization  of  W+~ .  It  is 
clear  that  W+~  is  the  weight  of  the  cut  in  N  with  Vj  being  the  nodes  of  N  with  the  state 
being  equal  to  1.  □ 

Hence,  a  neural  network  operating  in  a  serial  mode  is  equivalent  to  performing  a  local  search 
algorithm  for  finding  a  minimum  cut  in  the  network.  Changing  the  state  of  a  node  in  the  I 

network  is  equivalent  to  moving  it  from  one  side  of  the  cut  to  the  other  in  the  local  search 
algorithm. 

The  above  definition  of  the  model  results  in  an  energy  function  which  is  quadratic.  The 
definition  of  the  model  can  be  generalized  to  energy  functions  of  a  higher  degree  [1],  In  the  | 

general  case,  every  neuron  computes  an  algebraic  threshold  function  which  is  equivalent  to 
checking  which  state  (either  1  or  -1)  will  result  in  a  higher  value  of  the  energy  function. 

Example:  Consider  the  energy  function: 

E  =  Wx  2i3  Vi  V2  V3  +  W, ,2  V,  V2  +  W2,3l/2 1/3  +  \Vt  Vi  I 
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For  example,  the  generalization  of  (1)  for  node  1  is: 


+  1)  =  3gn(Hi(t)) 


where 

Hx(t)  =  W1X3V2V3  +  Wx,2V2  +  Wl 

We  will  start  by  investigating  the  connections  between  quadratic  energy  function  and  error 
correcting  codes  and  then  continue  by  looking  at  general  energy  functions. 


3  Neural  Networks  and  Graph  Theoretic  Codes 

The  main  goal  of  this  section  is  to  establish  the  relations  between  neural  networks  and  graph 
theoretic  error  correcting  codes.  Let  us  start  by  defining  the  family  of  graph  theoretic  codes 
(for  more  details  see  [8, 17]). 

Let  G  =  (V,  E)  be  an  undirected  graph,  with  V  being  the  set  of  nodes  of  G  and  E  being  the 
set  of  edges  of  G.  A  subset  of  the  set  of  edges  of  G  can  be  represented  by  a  characteristic 
vector  of  length  |  E  [,  with  edge  e,-  corresponding  to  the  t’s  entry  of  the  characteristic  vector. 
That  is,  every  S  C  E  can  be  represented  by  a  vector  to  be  denoted  by  1,;  such  that: 


if  e,-  €  S 
otherwise 


(6) 


Definition:  The  incident  matrix  of  a  graph  G  =  (V,E),  to  be  denoted  by  D is  a 

|  V  |  X  |  E  |  matrix  in  which  row  i  is  the  characteristic  vector  of  the  set  of  edges  incident 
upon  node  i  €  V. 

The  following  facts  from  graph  theory  [3]  are  the  basis  for  the  definition  of  the  family  of 
graph  theoretic  codes. 

Fact  1:  The  set  of  characteristic  vectors  which  corresponds  to  the  cuts  in  a  connected  graph 
G  =  (V,E)  forms  a  linear  vector  space  over  GF( 2),  with  dimension  (|  V’  j  —  ■ ).  The  Unear 
vector  space  that  corresponds  to  the  cuts  of  a  graph  G  will  be  denoted  as  the  cut  space  of 

G. 

It  is  interesting  to  note  that  the  circuits  in  a  graph  constitute  also  a  linear  vector  space. 

Fact  2:  Given  a  connected  graph  G  =  (V,E),  the  incident  matrix  of  G  has  rank  |  K  |  —  1. 

Every  row  in  D $  is  a  characteristic  vector  of  a  cut,  and  every  |  V  \  —  1  rows  of  Dq  form  a 
basis  for  the  cut  space  of  G. 

Hence,  given  a  connected  graph  G,  the  cut  space  of  the  graph  is  a  linear  block  code  [15,17]  of 
dimension  |  V  \  -1;  thus,  every  graph  has  an  [|  E  |,  |  V  \  -1]  code  associated  with  its  cuts. 

The  code  associated  with  the  cuts  of  a  graph  G  will  be  denoted  by  Cq. 

The  codes  associated  with  graphs,  that  is,  cut  codes  and  circuit  codes,  are  called  graph  ! 
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Figure  1:  A  graph  which  corresponds  to  an  (8,4)  code. 


theoretic  codes.  In  this  paper  only  cut  codes  will  be  discussed. 

Example:  Let  G  be  the  graph  in  Figure  1,  G  has  5  nodes  and  8  edges.  The  incident  matrix 
of  the  graph  G  is: 

/  1  1  1  0  0  0  0  0  \ 

0  0  1  1  1  0  0  0 

D$=  01010110  (7) 

1  0  0  0  0  1  0  1 

k  0  0  0  0  1  0  1  1  J 

A 

Any  4  rows  of  Dg  form  a  basis  of  the  cut  space  of  G.  For  example,  the  matrix  which  consists 
of  the  first  4  rows  of  Dg  is  a  generator  matrix  of  the  error  correcting  linear  block  code 
associated  with  G.  It  is  easily  observed  that  G  does  not  contain  a  cut  with  less  than  3  edges 
(besides  the  empty  cut);  thus,  the  code  C$  has  minimum  distance  3  and  can  correct  one 
error. 

Given  a  graph  G,  an  interesting  question  is,  how  to  formulate  the  Maximum  Likelihood 
Decoding  (MLD)  problem  of  the  code  Cq  in  a  graph  theoretic  language.  That  is,  given  a 
graph  G  —  (V,  E),  and  a  vector  Y  in  {0, 1}^,  what  is  the  codeword  in  Cq  that  is  the  closest 
to  Y  in  Hamming  distance.  The  following  lemma  will  answer  this  question. 

Lemma  3.1  Let  G  =  (V,  E)  be  a  connected  graph.  Let  Cq  be  the  code  associated  with  G. 
Let  Y  be  a  vector  over  {0,1}^.  Construct  a  new  graph,  to  be  denoted  by  Gy,  by  assigning 
weights  to  the  edges  of  G  as  follows: 

w,  =  (-1)*  (S) 

W,  is  the  weight  associated  with  edge  i  in  G. 

Then  MLD  ofY  with  respect  to  Cq  is  equivalent  to  finding  the  minimum  cut  in  Gy. 


Proof:  Let  us  assume  that  Y  contains  a  l’s.  Let  M  be  an  arbitrary  codeword  in  Cq.  Let 
N'''  denote  the  number  of  positions  in  which  M  contains  an  i  6  {0,1}  and  Y  contains  a 
j  e  {0, 1}.  Clearly, 

a  =  N0'1  +  N1,1 


Thus, 


-  ;VU  +  N1'0  =  N °’1  -  a  +  Nh0 


7 
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Minimizing  the  right  hand  side  in  equation  (9)  over  all  M  €  C$  is  equivalent  to  finding  a 
codeword  which  is  the  closest  to  Y.  On  the  other  hand,  minimizing  the  left  hand  side  is 
equivalent  to  finding  the  minimum  cut  in  Gy.  □ 

From  Lemma  3.1  it  follows  that: 

Theorem  3.1  Let  G  =  (V,  E)  be  a  connected  graph.  Then  MLD  of  a  word  Y  with  respect 
to  is  equivalent  to  finding  the  maximum  of  the  energy  function  E  of  the  neural  network 
which  is  defined  by  the  graph  Gy  and  all  its  threshold  values  are  equal  to  0. 

Proof:  By  Lemma  3.1:  MLD  of  Y  with  respect  to  C(-  is  equivalent  to  finding  the  minimum 
cut  in  Gy-  By  Proposition  2.2:  Finding  the  minimum  cut  in  a  graph  is  equivalent  to 
finding  the  maximum  of  the  energy  function  of  a  neural  network  defined  by  a  graph  with  all 
thresholds  being  zero.  □ 

Graph  theoretic  error  correcting  codes  are  limited  in  the  sense  that  [8]: 


<T  < 


(10) 


where  d“  is  the  minimum  distance  of  the  code.  For  example,  a  [7, 4]  Hamming  code  is  not  a 
graph  theoretic  code  because  it  has  minimum  distance  3,  and  ^  <  3.  Hence,  an  interesting 
question  is  whether  the  equivalence  stated  by  Theorem  3.1  can  be  generalized  to  all  linear 
block  codes.  The  energy  function  associated  with  the  MLD  of  graph  theoretic  codes  is 
quadratic.  It  turns  out  that  the  energy  function  associated  with  the  MLD  of  a  general  linear 
block  code  is  a  polynomial  over  the  n-cube.  The  discussion  regarding  the  generalization  is 
the  subject  of  Section  4. 


4  Error-Correcting  Codes  and  Energy  Functions 

In  this  section  we  will  extend  the  results  in  Section  3  and  show  that  the  MLD  problem  of 
linear  block  codes  is  equivalent  to  maximization  of  polynomials  over  the  binary  n-cube.  It 
will  be  also  shown  that  the  results  can  be  generalized  to  non-binary  codes. 


4.1  The  Binary  Case 

Consider  a  binary  linear  [n,&]  error-correcting  block  code  to  be  denoted  by  C  [15,17].  The 
code  C  is  defined  by  a  k  x  n  generator  matrix  G.  An  information  vector  b  =  (6l5  b2, . . . ,  6^.) 
is  encoded  into  the  codeword  v  =  (v\,v2, ,  u„)  such  that: 

k 

Vj  =  0  btfij  1  <j<n 

i=i 

where  0  denotes  Exclusive  OR. 
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The  key  idea  in  the  derivation  is  to  represent  the  symbols  of  the  additive  group  Z2  as  symbols 
in  the  multiplicative  group  {1,-1}  through  the  transformation 

a  (  1)° 


We  will  use  a  different  notation  for  the  {1,-1}  representation:  The  information  vector 
6  =  (6|,  62, . . . ,  bit)  is  represented  as  x  =  (xi,x2, . . .  ,  x*),  where  x,  =  (  —  1  )6* ,  and  the  encoded 
codeword  v  =  (ui,  v2, . . . ,  vn)  is  represented  as  y  =  (yi,  y2, . . . , y„).  Hence, 

Vj  =  (-ir  =  (H) 

i=i  i=i 

Example:  Consider  the  [7, 4]  systematic  Hamming  code  whose  generator  matrix  is  given  by 

/  1  0  0  0  0  1  1  \ 

0  10  0  10  1 
0010110 
\  0  0  0  1  1  1  1  / 

Given  the  4  information  symbols  (61?  62, 63, 64),  corresponding  codeword  is 
v  =  (61,62, 63, 64, 62©63©64,  6i©63©64,  6i©62©64) 

In  the  {1,-1}  representation,  this  looks  like 


y  —  (x4 ,  X2,  X3,  X4,  X2X3X4,  XiX3X4,  X4 X2X4) 

where  Xj  =  (— 1)6>. 

Definition:  In  the  {1,-1}  representation  of  a  code  instead  of  a  generator  matrix,  given 
an  information  vector  x  =  (xi,X2,..  .  ,x*),  we  will  use  an  encoding  procedure  x  -*  y,  where 
y  =  {y\,y2,...,yn)  and  y;  =  yJ(x1,Z2,...,Xfc)  is  a  monomial.  An  encoding  procedure  is 
systematic  iff  yj(ii,  x2, . . . ,  x*)  =  Xj  for  1  <  j  <  k. 

In  the  example,  the  [7, 4]  Hamming  code  is  described  by  the  systematic  encoding  procedure: 

(Xi ,  X2,  X3,  x4)  *  (ii,  x2,  x3,  X4,  x2x3x4,  x1x3x4,  x,x2x4)  (12) 

Another  example,  the  first  order  (shortened)  Reed-Muller  code  72(1,3)  [15]  is  described  by 
the  systematic  encoding  procedure: 

(Xi,X2,X3)  *  (xi,  X2,  X3,  X,X2)  X1X3,  X2X3,  XiX2X3) 
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while  the  first  order  Reed-Muller  code  R(  1,3)  is  described  by  the  encoding  procedure: 


(*o,  Xi,X2,X3)  *  (xQ)  XqX\j  XqX2j  XqX\X2i  XqX$)  XqX\X3)  XqX2X$i  XqX\X2£2^)  (13) 

The  generalization  to  any  R(l,m)  first  order  Reed  Muller  code  is  obvious. 

Definition:  Let  G  be  a  k  x  n  matrix  of  l’s  and  0’s.  The  polynomial  representation  of  G 
with  respect  to  a  vector  w6{l,  —  l}n,  to  be  denoted  by  Ew,  is: 

£u,(*)=x>in*?-'  (H) 

j=i  *=i 

Consider  the  linear  block  code  defined  by  the  generator  matrix  G  (or  equivalently  by  the 
encoding  procedure  associated  with  G).  The  polynomial  representation  of  G,  i.e.  Ev(x), 
will  be  called  the  energy  function  of  u>  with  respect  to  the  encoding  procedure  x  — ►  y.  Note 
that  Eu>(x)  =  u >  •  y(x). 

To  establish  the  connection  between  energy  functions  and  linear  block  codes,  we  will  prove 
that  finding  the  global  maximum  of  Eoj(x)  is  equivalent  to  MLD  of  a  vector  u>  with  respect 
to  the  code  C. 


Theorem  4.1  Given  an  [n,  k]  code  C  defined  by  an  encoding  procedure  x  — *  y,  and  a  vector 
u}  €  {1,— l}n,  the  closest  codeword  (in  Hamming  distance)  to  u  in  C  corresponds  to  an 
information  vector  b  =  (&i,  b2, . . . ,  6*)  if  and  only  if 

Etjjib)  =  max  Euj(x) 
xeO-i}* 


Proof:  Notice  that  for  any  information  vector  x  E  {1,  —1}*, 

Euj(x)  =  £"=  i  Ujyfix) 

=1  {j  ■  “j  =  yy(^)}  1-10':  Wj  ±  yj(x)}  I 
=  n  -  2  |  0  =  “j  ?  yAx))  I 
=  n  -  2dn{u,y) 

where  dn  denotes  Hamming  distance.  This  expression  implies  that  E^bi,  62, . . . ,  6*)  will 
achieve  a  maximum  iff  dn(to,y)  achieves  a  minimum.  □ 


Example:  Consider  the  [7,4]  Hamming  code,  defined  by  the  encoding  procedure  in  (12). 
Assume  we  want  to  perform  MLD  of  the  received  word 

u;  =  (1,-1, -1,1,1, -1,1) 

1U 


Eu(xlt  X2y  ^3t  ^4)  —  Xj  X3  "f*  3-4  "4“  X2X3X4  ^1^3^4  “4“  X1X2X4 

The  maximum  of  this  polynomial  occurs  at  Eu(l,  —  1,  — 1,1)  =  5.  So,  the  received  word  is 
decoded  as  (1,-1,  —1, 1). 

Example:  Consider  the  /2(1,3)  first  order  Reed-Muller  code,  defined  by  the  encoding  pro¬ 
cedure  in  (13).  Given  the  received  word  w  =  (u>0,u>i, . . . ,  W7),  the  energy  function  is 

Eu(xo,  Xl,X2,X3)  =  X0(o?0  +  WXXi  +W2X2  +  W3X1X2  +  W4X3  +  W5X1X3  +  ^6X2^3  +  U>7XlX2X3) 

=  Xq(u}q  -f  Eu}(Xi,X2,  X3)) 

where 


Ew(xi,  x2,  x3)  =  WjXi  +  w2x2  +  W3X1X2  +  W4X3  +  a>5x1x3  +  u>6x2x3  +  W7X1X2X3 
Hence,  it  is  enough  to  find 

max  |  £u(ii,x2,x3)  | 

If  the  energy  that  corresponds  to  the  maximum  is  positive,  then  xo  =  1,  otherwise,  xn  =  —  1. 
Assume  we  receive  a;  =  (—1,1, 1,-1, 1, 1,1,  — 1).  We  have, 


then 


Eu{%\,  X2,  X3)  =  XX  +  X2  —  X|X2  +  X3  -f  XXX3  +  X2X3  —  X1X2X3 


max  |  Ew(xi,X2,x3)  |=  £w(1,1,1)  =  3 


Since  the  energy  is  positive,  the  received  word  is  decoded  as  (1,1, 1,1).  In  this  case  the 
decoding  is  not  unique,  since  the  maximum  is  achieved  at  more  than  one  point. 

Given  an  encoding  procedure,  we  can  use  the  same  argument  as  in  Theorem  4.1  to  determine 
the  minimum  distance  of  the  code. 

Consider  the  encoding  procedure 

x  =  (xj,x2,...,x*)  -+  y  =  {yi,y2,..  ■  ,yn) 
and  the  energy  function  with  u;  =  (l,l,...,l) 

E(x  1,  X2,  . . .  xfc)  =  y\  +  3/2  +  •  •  •  +  J/n 


As  before, 
and 


E(xi,  x2, . . .  x;.)  —  n  —  2d//((l  1  •  • .  1 ),  (j/i,  y2, .  ■  ■ ,  yn)) 


occurs  at 


M  d=  max  E(xi,x2,  ..  .  ,x*) 

So,  the  conclusion  is  that  <f  (the  minimum  distance  of  the  code)  is  given  by 


(15) 


Example:  For  the  [7, 4]  Hamming  code, 


so 


M  —  max :  X|  -f  x2  4-  x3  +  x4  -f-  x2x3x4  +  xxx3x4  +  xix2x4  =  1 


W-Ui£-Ui-3 


2  2 

Example:  For  the  /2(1,3)  first  order  Reed-Muller  code, 


M  =  Xo(l  +  Xi  +  X2  +  XXX2  +  X3  +  X!X3  +  x2x3  +  xxx2x3) 

which  can  be  written  as, 

M  =  max  x0(l  +  x1)(H-x2)(l  +  x3) 


(16) 


The  maximum  in  (16)  is  M  =  0,  because  at  least  one  of  the  x,’s  (  for  i  >  1)  must  be  equal 
to  -1.  Thus, 

d-  =  IzE  =  5  =  4 
2  2 

The  same  argument  can  be  used  for  any  f?(l,m)  code,  giving 

o  m 

cT  -  - _ om— 1 

2 


4.2  Generalization  to  Non-Binary  Codes 

Consider  now  a  linear  [n,k]  error-correcting  code  over  a  field  GF(p),  p  a  prime.  Let  G  be 
the  generator  matrix  of  the  code.  Then  k  symbols  in  Zp  (62,  &2, . , . ,  6jt)  are  encoded  into 
codeword  v  =  ( ,  v2, . . . ,  vn)  as  follows, 

k 

Vj  ~  'y  ]  modp),  1  ^  j  ^  n 

m=  1 

Again,  the  key  idea  is  to  use  the  multiplicative  representation!  let  u  be  the  p\}\  root  of  unity, 

2>ri 

a  =  e  p 
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The  additiv*  group  ZP  can  be  represented  as  a  multiplicative  group  of  p- roots  of  unity  through 
the  transformmtion  a  -+  u\ 

In  the  multiplicative  representation;  the  k  information  symbols  (fc, ,  h2, . . . ,  h()  are  represented 

as 

(x,,x2,  . .,x*)  =  (ub\u^ . u6‘) 

so  the  encoded  codeword  v  =  t«i ,  v-j, . . . , u„)  is  represented  as  y  —  (yx .  yx, ...  ,yn)  where 


V,  =  u  = 


v  * 

U^"»«  l 


*  < 


m=l  m=  1 


Example:  Consider  the  [4,2]  ternary  Hamming  code  whose  generator  niiuiix  is 


G  = 


( 


1  0  2  0  'S 
0  12  1  ) 


Given  the  two  information  symbols  (6,  .kj),  the  corresponding  codeword  :s 


v  =  (fcj,  6j,  2fh  -+  26j(  modd),  f»2) 
In  the  multiplicative  representation,  this  becomes 


where  xj  =  ub’,  u  =  e2?-. 


1/  =  (xi,x2.x;tj,x2) 


Hence,  as  for  the  binary  case,  we  can  represent  a  code  on  a  field  with  p  elemerv/<  ‘  prime) 
by  an  encoding  procedure.  The  elements  are  now  p  roots  of  unity.  So,  given  k  information 
symbols,  we  have  the  1-1  assignment 

x  =  (xux2,...,xk)  -*  y  =  (t/i,y2,...,y„) 
where  yj  =  y_,(xi ,  x2, . . . ,  xk)  is  a  monomial. 

We  will  show  that  for  p  =  3  or  5,  there  are  easy  expressions  of  the  energy  function  that 
generalize  the  binary  case  (p  =  2). 

We  start  by  redefining  the  energy  function.  Given  an  encoding  procedure 

x  =  {xx  ,i2,...,n)-*y  =  (yi,y2,--  -  ,yn) 

and  uj  =  (u)X,u)2, . . .  ,u>„)  a  vector  whose  entries  are  pth  roots  of  unity,  we  deline  i  !;o  <•■  .•• 
function  as  follows: 


EU)(x)  —  [3?(^iyi)J  -f  L3R(cJ2y2)J  +  •  ■  •  +  [.7?(u,'nyn )J 
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where  3R(x)  denotes  real  part,  i  m’ee.-r  ‘^art  am*  i  <•*":  <-»  ?  gate  of  i. 

Notice  that  this  energy-  funi.*  -  m  . . ides  with  th«-  •>  -  ‘  *  ••  -  that  rase,  v  -  -1). 

Before  proceeding  further.  !**•  »  recall  the  detmition  *'•  -t^tance  17]. 


Definition:  Fbe  Lee  weigh'  ol  an  n  —  tuple  (ai.aj.  .  a.,  .  i.  fc  Zp ,  p  a  prime,  is  defined  as 

* 

u'i  =  £  * 

i 

where 


a.  0  <  a,  <  § 

[  P-o,  §  <  a,  <  p  -  1 

The  /  ft  distance  between  two  n  —  topics  n  defined  as  the  Lee  weight  of  their  difference. 

We  study  the  cases  p  =  3  and  p  =  *>  From  now  on,  x  — *  y  denotes  an  encoding  procedure 
that  defines  a  code  C,  and  z  and  y  are  vectors  of  length  k  and  u,  respectively,  of  3  or  5  roots 
•f  unity 

We  are  going  to  prove  two  theorems.  The  first  one  is  similar  to  Theorem  4.1.  It  states  that 
MI  D  in  a  ternary  code  is  equivalent  to  the  maximization  of  the  energy  function  in  (17).  The 
second  theorem  states  something  similar  for  codes  on  the  5  roots  of  unity,  but  with  respect 
to  the  Lee  distance. 

Theorem  4.2  l  et  p  =  3,  a  — •  h.  then  b  is  the  closest  codeword  (in  Hamming  distance)  to  a 
word  u;  if  and  only  :f 

-  £w(n)  =  maxEoi(^) 

Proof:  S  imilar  to  that  of  Theorem  4.1.  □ 

Example:  Consider  again  the  [4,2]  ternary  Hamming  code.  Assume  u>  —  (u,n2,l,  u)  is 
receive!  in  —  r^r1),  then 

H^(z,,z2)  =  [Ji(u2x  i)J  +  [3?(ux2)J  +  [3?(x2.r2)j  +  [3j(u2x2)J 
It  can  be  easily  verified  that  max  Eu(xx,  x2)  =  Eu(u,u2)  =  2,  so  u  is  decock'd  as  {u,u2). 


Theorem  4.3  Let  p  =  5,  a  — >  6,  then  b  is  the  closest  codeword  (in  Lee  distance)  to  a  word 
v  if  and  only  if 

Eoj(a)  =  ma  xEu(x) 

X 
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Proof:  Using  the  definition  of  the  energy  function, 

Eu >{a)  =  |  {j  :  Ujbj  =  1}  |  -  |  {j  :  Ujbj  =  u>2  or  w3}  | 

=  n—  |  {j  :  ujjbj  =  u  or  lj4}  |  —  2  |  {j?  :  ujjbj  =  ui2  or  c*>3}  | 

=  n  —  dL( u,  b) 

where  di  denotes  Lee  distance. 

Hence,  Eu>(a)  reaches  a  maximum  if  and  only  if  di{uj,  b)  reaches  a  minimum.  □ 

Example:  Consider  the  [6,2]  code  on  Z5  generated  by 

/  1  2  3  4  1  0  \ 

G~  ^  1  1  1  1  0  1  ) 

The  corresponding  encoding  procedure,  taking  the  symbols  as  5  roots  of  unity,  is  given  by 

(xX,X2)  -♦  ( X!X2,X2X2,XjX2,xJx2,Xi,X2 ) 

Assume  u>  =  (u2,u4, 1,  it3,  u,  1)  is  received,  where  it  =  eV\  The  energy  function  is  then 
Eu){x\,x2)  =  [3?(it3iiX2)J  +  [3?(ui2i2)J  +  [3?(x3x2)J  +  [3?(u2xJx2J  +  L3f?(u4X!)J  +  [^(X2)J 
It  can  be  verified  that  the  maximum  occurs  at  Eu(u2, 1)  =  4,  so  w  is  decoded  as  (u2, 1). 

5  Representing  Linear  Codes  as  Stable  States  of  En¬ 
ergy  Functions 

Let  C  be  a  linear  block  code  (over  GF( 2))  defined  by  the  generator  matrix  G.  Let  Ec  be  a 
polynomial  over  {1,-1}  (energy  function)  with  the  property  that  every  local  maximum  in 
Ec  corresponds  to  a  codeword  in  C  and  every  codeword  in  C  corresponds  to  a  local  maximum 
in  Ec- 

Consider  the  following  question:  given  a  code  C  defined  by  G ,  is  there  an  efficient  algorithm 
to  construct  Ep?  This  section  describes  the  development  of  such  an  algorithm. 

Consider  the  [n,  /:]  linear  block  code  C.  Without  loss  of  generality,  let  us  assume  that  the 
generator  matrix  G  is  given  in  a  systematic  form;  that  is, 

G=[Ik:P }  (18) 


where  Ik  is  a  k  x  k  identity  matrix,  and  P  is  a  k  x  (n  —  k)  matrix.  The  parity  check  matrix 

C 


By  the  definition  of  H,  for  all  X  €  C, 

XHT  =  0  (20) 

where  0  in  (20)  is  an  all  zero  vector  of  length  (n  —  k).  Equation  (20)  can  be  written  using 
the  polynomial  representation  devised  in  (14),  with  the  vector  of  coefficients  being  the  all-1 
vector. 

Lemma  5.1  Let  E(X)  be  the  polynomial  representation  of  HT  with  respect  to  the  all-1 
vector.  Then  X  G  C  iff  E(X)  =  n  —  k. 

Proof:  E  has  (n  —  k)  terms,  and  all  the  coefficients  are  equal  1.  Hence,  E  =  n  —  k  iff  all 
the  terms  are  equal  to  1.  □ 

The  lemma  ensures  that  in  the  polynomial  E  every  codeword  corresponds  to  a  global  maxi¬ 
mum  (stable  state).  But  does  every  local  maximum  correspond  to  a  codeword? 

Theorem  5.1  Let  C  be  a  linear  block  code,  with  G,H ,Ec  and  E  as  defined  above;  then  E 
is  a  polynomial  with  the  properties  of  Ec-  That  is,  X  corresponds  to  a  local  maximum  in  E 

iffXecr 

Proof:  One  direction  follows  from  the  lemma.  The  global  maximum  of  E  is  n  —  k;  thus, 
every  codeword  is  a  global  (and  a  local)  maximum. 

The  second  direction  follows  from  the  fact  that  H  has  a  systematic  form.  The  last  n  —  k 
variables  in  E',  that  is,  A*+1, . . . ,  Xn ,  appear  only  in  one  term  each.  That  is,  Afc+i  appears 
only  in  the  first  term,  Xk+i  appears  only  in  the  second  term  and  so  on.  Assume  there  exists 
a  vector  V  which  corresponds  to  a  local  maximum  (which  is  not  global).  That  is  E{V )  =  L, 
where  L  <  n  —  k.  Hence,  there  exists  at  least  one  term  in  E(V)  which  is  not  1.  But  this 
term  can  be  made  1  by  flipping  the  value  of  the  variable  which  appears  only  in  this  term. 
This  contradicts  the  fact  that  V  is  a  local  maximum.  □ 

Examples: 

1.  Consider  the  single  parity  check  code,  it  is  an  [ n,n  —  1]  code  and 

G  =  [/„_!  :  l„-x] 

HT  =ln 

where  ln  is  the  all-1  vector  of  length  n.  Hence, 

E(  X)  =  XlX2---Xn 

It  is  clear  that  E(X)  =  1  iff  X  G  C.  Also,  E(X)  =  -1  for  all  .V  g  C.  Thus,  local 
maxima  in  E  have  one  to  one  correspondence  with  codewords  in  C 
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2.  Consider  the  simple  repetition  code,  it  is  an  [n,  1]  code  and 

G  =  [1,1, -",1] 

H  =  [in— 1  :  In—  1 3 

And 

E(X)  =  X1(X2  +  X3  +  ...  +  X n) 

It  is  clear  that  there  are  two  stable  states  in  E  above,  the  all-1  and  the  all-(-l)  vectors. 

3.  Consider  the  [7,4]  Hamming  code  (see  also  Section  4), 

’110' 

1  0  1 
0  1  1 

Ht  =  111  (21) 

1  0  0 
0  10 
0  0  i 

E(X)  =  XlX2X4Xs  +  X1X3X4X6  +  X2X3X4X7  (22) 

Again,  the  polynomial  in  (22)  has  the  [7,4]  Hamming  code  as  the  set  of  its  local 
maxima. 


To  summarize,  given  a  linear  code  C  the  algorithm  for  constructing  a  polynomial  Ec  is  as 
follows: 


1.  Construct  the  systematic  generator  matrix  of  C  by  performing  row  operation  on  the 
generator  matrix  G. 

2.  Construct  the  systematic  parity  check  matrix  of  C,  according  to  (19). 

3.  Construct  E ,  which  is  the  polynomial  representation  of  HT  with  respect  to  the  all-1 
vector.  By  Theorem  5.1,  let  Ec  =  E. 

A  few  remarks  and  generalizations  regarding  the  above  development: 

1.  The  construction  described  above  also  works  for  cosets  of  linear  codes.  Let  u>  be  the 
vector  of  length  n  —  k  of  the  coefficients  of  E.  In  the  above  construction  we  chose 
uj  to  be  the  all-1  vector,  and  got  that  Ec  —  E.  Let  C  be  a  coset  of  C,  and  let  S 
be  the  syndrome  which  corresponds  to  C.  It  can  be  proven  (basically  as  in  the  proof 
of  Theorem  5.1)  that  there  is  a  1-1  correspondence  between  local  maxima  of  the 
polynomial  representation  of  IIT  with  w  =  S  and  the  vectors  in  the  coset  C. 

Clearly,  the  syndrome  which  corresponds  to  the  code  C  is  the  all-1  vector  (remembering 
that  in  the  transformation  in  Section  4,  0  goes  to  1). 
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The  above  construction  (with  respect  to  the  one  suggested  in  Section  4)  is  a  dual 
way  of  defining  the  MLD  problem.  Consider  a  linear  block  code  C,  defined  by  its 
parity  check  matrix  H.  Given  a  vector  V,  the  MLD  problem  can  be  defined  as  finding 
the  local  maximum  in  Ec  which  is  the  closest  to  V.  Or  equivalently,  finding  a  local 
maximum  of  the  energy  function  associated  with  the  syndrome  (corresponding  to  V) 
that  is  achieved  by  a  vector  of  minimum  weight. 


6  Boolean  Functions,  Polynomials  and  Codes 

In  this  section  the  representation  of  boolean  functions  as  polynomials  over  the  field  of  real 
numbers  is  investigated.  In  view  of  the  results  in  Section  4,  applications  of  the  derived  j 

representation  to  coding  theory  are  also  investigated.  Although  a  part  of  the  material  in  this 
section  is  known  (see  for  example  (15,16]),  we  include  the  detailed  derivation  as  we  believe 
that  it  is  novel  with  regard  to  the  mode  of  presentation.  Let  us  start  by  some  definitions 
and  notations. 

Definition:  A  boolean  function  f  on  n  variables,  is  a  mapping,  I 

/  :  {0, 1 }n  — ►  {0,1} 

As  in  section  4,  it  is  useful  to  define  boolean  functions  using  the  symbols  T’  and  ’-1’  instead 
of  using  the  symbols  ’0’  and  T’,  respectively. 

Definition:  A  Hadamard  matrix  of  order  m,  to  be  denoted  by  Hm ,  is  an  m  x  m  matrix  of 
+l’s  and  -l’s  such  that: 

HmHl  =  mlm  (23) 

where  Im  is  the  m  x  m  identity  matrix.  The  above  definition  is  equivalent  to  saying  that 
any  two  rows  of  H  are  orthogonal. 

Hadamard  matrices  of  order  2k  exist  for  all  k  >  0.  The  so  called  Sylvester  construction  is  as 
follows: 


Hi 

H2 


H^n+l 


[1] 

'  1  1 
1  -1 


H2n 

H2n 

H2n 

-Hz 

(24) 


Definition:  Given  a  boolean  function  /  of  order  n,  Pj  is  a  polynomial  (with  coefficients 
over  the  field  of  real  numbers)  equivalent  to  /  iff  for  all  A  6  {1,  —  l}n: 

f(X)  =  P,{X) 

Problem:  Given  a  boolean  function  /  of  order  n,  compute  Pj  -  a  polynomial  which  is 
equivalent  to  /. 


IS 


( 


As  an  example,  let  /  =  x\  ®  1 3;  that  is,  f  is  the  XOR  function  of  two  variables.  It  is  easy 
to  check  that  in  the  {1,-1}  representation  Pj  =  x\x 2. 

Notice  that  for  every  boolean  function  /,  the  polynomial  Pj  is  linear  in  each  of  its  variables 
because  x2  =  1  for  x  G  {—1,1}.  It  turns  out  that  every  boolean  function  has  a  unique  rep¬ 
resentation  as  a  polynomial.  This  representation  is  derived  by  using  the  Hadamard  matrix, 
as  described  by  the  following  theorem. 

Theorem  6.1  Let  f  be  a  boolean  function  of  order  n.  Let  Pj  be  a  polynomial  equivalent  to 
f.  Let  A  denote  the  vector  of  coefficients  of  Pj.  Let  P  denote  the  vector  of  the  2”  values  of 
Pj  (and  f).  Then: 

1.  The  polynomial  Pj  always  exists  and  is  unique. 

2.  The  coefficients  of  Pj  are  computed  as  follows, 


A  = 


I 


Proof:  The  proof  is  constructive.  The  idea  is  to  compute  A  by  solving  a  system  of  linear 
equations.  Let  us  start  by  computing  the  coefficients  of  Pj,  for  /  being  a  function  of  one 
variable. 


and, 


Pj  =  a0  +  axxx 


Pj(l)  =  oo  +  Oi 

P /( — 1)  =  ao-aj  I 

clearly, 

P  =  HtA  25) 

and  by  (23), 

A=\n,P  1  .'6)  | 

Claim.  The  above  result  can  be  generalized  to  n  variables  as  follows: 


P  -  HjnA  1  27) 

The  proof  is  by  induction.  The  case  n  =  1  was  proven  above.  Assume  (27)  b  rv.o  for 
n.  Clearly,  every  polynomial  of  n  -f  1  variables  can  be  written  as  a  combination  two 
polynomials  of  n  variables  each, 

Pf(x\,~,xn+i)  =  P}{xu..,xn)  +  xn+lP](xu..xn)  1 28) 
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-1 
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1 

-1 
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-1 
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-1 

1 

1 

1 

-1 

1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

-1 

-1 

Figure  2:  the  truth  table  of  /. 


There  are  two  possibilities,  either  xn+i  =  1  or  xn+i  =  —  1.  Hence,  by  the  induction  hypothesis 
(27),  the  system  of  linear  equations  for  n  +  1  variables  becomes: 

_  H2n  J/2» 

[H2n  ~H2n 

Following  from  the  recursive  definition  of  Hadamard  matrices  (24), 

P  =  H2«vA  (30) 

Hadamard  matrices  are  nonsingular;  thus,  for  any  given  /  a  unique  Pj  (defined  by  the  vector 
of  coefficients  A)  always  exists.  □ 

Example:  Consider  the  function  /  of  3  variables, 


/  =  (xj  A  x3)  V  (xy  A  x2) 


(31) 


The  truth  table  of  /  appears  in  Figure  2  (note  that  a  logical  0  is  mapped  to  1,  and  a  logical 
1  is  mapped  to  -1).  By  Theorem  6.1, 


Pj  =  -(2  +  6ii  +  2x2  —  2xiZ2  +  2x3  —  2xtx3  +  2x2x3  —  2xix2x3) 
8 


(32) 


A  few  remarks  concerning  the  above  method: 

1.  Special  care  should  be  taken  with  respect  to  the  the  order  by  which  the  values  of  /  are 
specified  in  P.  The  function  /  should  be  specified  according  to  the  natural  order  with 
the  highest  index  being  the  most  significant  bit  (as  in  Figure  2). 

2.  A  monomial  can  be  described  by  a  vector  of  l’s  and  -l’s  with  a  variable  appearing  in 
the  monomial  iff  it  corresponds  to  -1.  For  example,  (—1,-1, 1)  corresponds  to  x3x2. 
Using  this  this  description,  the  monomials  of  Pj  appear  according  to  the  natural  order 
with  the  highest  index  being  the  most  significant  bit. 

In  (32)  the  terms  are  written  according  the  order  they  appear  in  A  for  3  variables. 
This  order  will  be  denoted  as  the  natural  order. 
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3.  The  above  method  is  applicable  not  just  to  boolean  functions  but  to  any  function  of 
the  form  /  :  {1,  —1}”  — ♦  St. 

The  representation  theory  developed  above  holds  also  if  one  is  interested  in  the  question 
of  finding  an  equivalent  polynomial,  over  {0,1},  of  a  boolean  function.  To  see  this,  simply 
observe  that  any  monomial  over  {(1,  —1}  can  be  written  as  a  polynomial  over  {0, 1}  by  the 
change  of  variable  x  =  1  —  2u,  as  follows: 

n*.  =  l+E(-2),£II“)  (33) 

•=1  «=i  s,  jest 

with  Si  being  a  subset  of  {1, . . . ,  Jfc}  with  i  elements. 

For  example, 

X1X2X3  =  1  —  2(«i  +  u2  +  U3)  +  4{uiu2  +  ti!«3  +  u2u3)  —  8uiu2u3 


The  question  is:  what  is  the  form  of  the  transformation  matrix  from  a  boolean  function  to 
its  equivalent  0-1  polynomial?  To  answer  this  question  it  is  useful  to  use  the  same  technique 
as  in  Theorem  6.1.  That  is,  to  define  the  transformation  recursively. 


Lemma  6.1  Let  A  be  the  vector  of  the  coefficients  of  a  polynomial  over  {1,-1}",  with  the 
coefficients  ordered  according  to  the  so  called  natural  order.  Let  A  be  the  the  vector  of  the 
coefficients  of  a  polynomial  over  {0,  l}n  which  is  equal  to  the  polynomial  associated  with  A. 
Then 

A  =  F2nA 

where  F2 »  is  defined  recursively  as  follows: 


And  also, 


Fi 

F2 


F\ jn+l 


[1] 

‘  1  1 
0  -2 

F2n  F?n 

0  -2  F2n 


F\l 

F-1 

F2„\i 


[1] 

'l  0.5 
0  -0.5 

• 

'  Ffn1  0.5 F^ 

0  -0.5F2;1 


(34) 


(35) 
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Proof:  The  proof  is  by  induction,  using  the  same  arguments  as  in  Theorem  6.1.  □ 

Using  the  above  lemma,  we  can  formulate  the  following  theorem  which  is  the  equivalent  of 
Theorem  6.1  for  the  {0,1}  case. 

Theorem  6.2  Let  f  be  a  boolean  function  of  order  n.  Let  Pj  be  a  polynomial  over  {0,1}" 
which  is  equivalent  to  f .  Then  a  unique  Pj  always  exists  and  is  computed  as  described  in 
the  following  proof. 


1 

F2n 

'  Hfr} 

2 

0 

—  2F2n 

Hfn1 

-h;«1 

Performing  the  multiplication  above  in  blocks  results  in, 

0 


H2n+1  — 


F2nH^ 
-1 


-FinHfn1  FnH ^ 
Thus,  the  recursive  definition  of  H  is  as  follows: 

H,  =  [1] 

H2  = 


1  0 

-1  1 


H 2*»+l  — 


H2n  0 

—  H2n  H2n 


(36) 


Proof:  The  existence  and  uniqueness  of  Pj  follows  from  Theorem  6.1.  By  Theorem  6.1 
and  Lemma  6.1, 

A  =  F 2”  Hfn  P 

Let 

TJ  drf  j~»  IT-1 

n  2n  —  r  2n**2n 

Then,  by  the  recursive  definition  of  F  and  H , 


(37) 


(38) 


(39) 

□ 


The  inverse  of  H  can  be  derived  by  using  the  recursive  definition  of  H, 


tff1  =  [1] 


fr-i 

11 2n+l 


1  0  ' 

1  1 

Hfr}  0 

Hfn1  II  . 


(40) 


To  summarize,  we  derived  the  following  transformations  and  presented  them  in  a  recursive 
form, 
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1.  From  a  boolean  function  to  an  equivalent  polynomial  over  {1,-1},  and  the  inverse 
transformation  (24). 

2.  From  a  boolean  function  to  an  equivalent  polynomial  over  {0,1}  by  (39),  and  the 
inverse  transformation  (40). 

3.  From  a  polynomial  over  {1,-1}  to  a  polynomial  over  {0, 1}  by  (34),  and  the  inverse 
transformation  (35). 

The  representation  theory  developed  above  can  be  used  for  representing  error-correcting 
codes  in  a  way  that  generalizes  the  representation  that  is  described  in  Section  4.  Consider 
the  linear  [n,  &]  block  code  C.  The  code  C  can  be  represented  by  viewing  each  coordinate 
of  the  code  as  a  boolean  function  of  k  variables.  A  vector  V  G  C  iff  there  exists  a  vector 
X  6  { 1,  — 1  }fc  such  that 

Clearly,  the  boolean  functions  associated  with  the  coordinates  of  a  linear  block  code  are 
determined  by  the  basis  by  which  the  code  is  represented.  For  linear  block  codes,  every 
coordinate  /,-  corresponds  to  an  XOR  operation  of  some  variables  (according  to  the  basis 
of  the  code).  Thus,  for  every  i,  the  boolean  function  /,  can  be  transformed  by  the  method 
devised  by  Theorem  6.1  to  an  equivalent  polynomial  over  {1,  —  l}fc  which  consists  of  one 
monomial  only.  By  the  same  argument  as  in  Theorem  4.1,  the  MLD  of  a  given  word  W  is 
equivalent  to  solving  the  following  maximization  problem,  with  X  €  {1,  —  l}fc, 

ma.x(£Wili{X))  (41) 

1=1 

Observation  1:  By  Theorem  6.1,  every  monomial  corresponds  to  a  row  in  a  Hadamard 
matrix.  Since  every  /;  corresponds  to  a  monomial  it  follows  that  every  coordinate  i  of  a 
linear  block  code  corresponds  to  a  row  in  a  the  Hadamard  matrix  of  order  2*.  By  definition, 
the  first  order  Reed-Muller  code  (see  Section  4  )  consists  of  all  the  possible  2k  monomials. 
Thus,  the  [2fc,  £]  first  order  Reed-Muller  code  is  the  set  of  all  rows  of  the  Hadamard  matrix 
of  order  2k.  Hence,  every  linear  block  code  is  a  punctured  first  order  Reed-Muller  code. 

Observation  2:  The  MLD  problem  is  equivalent  to  finding  a  codeword  such  that  its  inner 
product  with  the  received  word  is  maximal  over  all  codewords  (using  the  {1,-1}  representa¬ 
tion).  By  Observation  1,  the  MLD  of  a  first  order  Reed-Muller  code  is  equivalent  to  finding 
an  entry  with  maximal  value  in  the  vector  H2k\V. 

For  a  general  linear  block  code:  a  linear  block  code  is  a  punctured  first  order  Reed-Muller 
code  (by  Observation  1).  Thus,  we  can  construct  a  vector  IF  of  length  2k  (using  IF)  such 
that  it  has  zeros  in  all  coordinates  that  do  not  correspond  to  a  coordinate  of  the  code.  The 
MLD  problem  of  W  is  again  equivalent  to  finding  an  entry  with  maximal  value  in  the  vector 
H2^V ■  Hence,  the  Fast  Hadamard  Transform  [15]  can  be  used  efficiently  to  decode  any  low 
rate  linear  block  code. 
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Observation  3:  The  following  simple  facts  about  an  [n,  k]  linear  block  code  follow  directly 
from  Observation  1: 

•  The  vectors  of  length  2*  which  represent  the  coordinates  of  a  linear  block  code  are 
orthogonal  since  they  correspond  to  columns  of  a  Hadamard  matrix. 

•  The  number  of  l’s  is  equal  to  the  number  of  -l’s  in  each  coordinate  since  the  coordinates 
correspond  to  columns  of  a  Hadamard  matrix. 

•  The  minimum  weight  (distance)  of  a  first  order  Reed-Muller  code  is  2k~l  =  0.5n  since 
the  codewords  are  the  rows  of  the  Hadamard  matrix  of  order  2k. 

Observation  4:  The  MLD  problem  as  defined  by  (41)  holds  also  for  nonlinear  codes.  For 
nonlinear  codes,  a  coordinate  fi  can  consist  of  more  than  one  monomial.  For  example, 
consider  the  following  nonlinear  code  of  4  codewords. 


C  =  [(00100),  (11111),  (10101),  (01011)] 

Then, 

fl  ~  Xj£2 

h  = 

/3  =  0.5(  — 1  -  X,  -  X2  +  X!X2) 

/<  =  X! 

/5  =  0.5(— 1  +X!  +X2  +  X1X2) 

From  the  above  generalization,  it  follows  that  both  for  linear  and  nonlinear  codes  the  MLD 
problem  is  equivalent  to  a  maximization  of  a  polynomial  over  {1,  —1}.  Hence,  the  following 
rather  surprising  theorem  follows: 

Theorem  6.3  The  following  3  problems  are  equivalent: 

1.  Maximization  of  polynomials  with  rational  coefficients  over  the  k-cube. 

2.  The  MLD  problem  of  an  [rc,  k]  linear  block  code. 

3.  The  MLD  problem  of  a  block  code  not  necessarily  linear  that  consists  of  2k  codewords. 

7  Solving  0-1  nonlinear  programming  problems  using 
decoding  techniques 


An  unconstrained  nonlinear  0-1  program  [11]  is 


a  problem  of  the  form: 


max 


i>.n 

1=1  >es, 


(12) 
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where  5,-  is  a  subset  of  {1, ,  n},  and  Xj  €  {0, 1}  for  all  i.  Basically,  the  problem  in  (42)  is 
a  problem  of  finding  a  maximum  of  a  0-1  polynomial.  A  special  case  of  (42)  is  the  quadratic 
polynomial  over  {—1,1}  which  was  presented  in  Section  2.  Clearly,  every  polynomial  over 
{—1,1}  can  be  transformed  to  an  equivalent  one  over  {0,1}  by  a  change  of  variable  as 
discussed  in  Section  6.  The  maximization  of  a  quadratic  polynomial  over  {0,1}  is  known  to 
be  an  NP-hard  problem  [6].  One  of  the  ways  to  prove  it  is  by  showing  that  the  Maximum 
Cut  in  a  graph  problem  can  be  reduced  to  it.  The  reduction  is  based  on  the  same  technique 
which  is  used  in  Section  2  to  show  the  equivalence  between  quadratic  energy  functions  and 
cuts  in  graph. 

The  problem  in  (42)  was  studied  extensively  [9,11].  The  main  effort  concentrated  in  identi¬ 
fying  special  cases  which  are  solvable  in  polynomial  time  [12]  and  in  devising  approximation 
techniques  [10]. 

The  most  common  technique  for  solving  unconstrained  0-1  programs  is  by  transforming  them 
to  the  problem  of  finding  the  maximum  weight  independent  set  in  a  graph  [2,19].  Finding 
the  maximum  weight  independent  set  in  a  graph  is  NP-hard,  but  there  are  some  solvable 
cases.  For  example,  the  problem  is  solvable  in  polynomial  time  (by  min  cut-max  flow  tech¬ 
niques)  if  the  graph  is  bipartite  [2].  A  known  class  of  problems,  like  (42),  which  are  solvable 
in  polynomial  time,  are  those  problems  which  correspond  to  finding  the  maximum  weight 
independent  set  in  a  bipartite  graph. 

Definition:  Let  G  =  ( V ,  E )  be  a  graph,  S  is  an  independent  set  of  nodes  in  the  graph  iff 
S  C  V  and  no  two  nodes  of  5  are  connected  by  an  edge.  Suppose  that  every  node  in  V  is 
assigned  a  positive  integer  called  the  weight  of  a  node.  The  problem  of  finding  an  indepen¬ 
dent  set  of  nodes  such  that  the  sum  of  its  weights  is  maximal  over  all  possible  independent 
sets,  is  known  as  the  maximum  weight  independent  set  problem. 

The  problem  in  (42)  is  transformed  to  the  problem  of  finding  the  maximum  weight  indepen¬ 
dent  set  by  using  the  concept  of  a  conflict  graph  of  a  0-1  polynomial  [2,19].  The  idea  will  be 
presented  by  the  following  example. 

Example:  Consider  the  following  0-1  polynomial, 

/  =  -2Xx  -  2X2  +  5XxX2  -  AXxX2X3  (43) 

One  can  show  that  /  can  be  transformed  to  an  equivalent  polynomial  such  that  all  the  terms 
(except  the  constant  one)  have  positive  coefficients.  The  new  polynomial  involves  both  the 
variables  and  their  complements.  This  is  done  by  noticing  that: 

X  =  1  —  X 


Hence, 


/  =  -4  +  2Xx  +  2X2  +  XxX2  +  4.V,  AVY3 


(44) 


Clearly,  maximization  of  /  is  equivalent  to  maximization  of  f  without  the  constant  term;  so 
the  constant  term  can  be  neglected. 
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Figure  3:  The  conflict  graph  associated  with  /. 


The  conflict  graph ,  to  be  denoted  by  G(f),  associated  with  a  polynomial  /  has  a  node  set 
which  corresponds  to  the  terms  of  /,  one  node  to  a  term  (but  the  constant  term).  Two  nodes 
in  G(f )  are  connected  by  an  edge  iff  one  of  the  corresponding  terms  contains  a  variable  and 
the  other  corresponding  term  contains  the  same  variable  complemented.  The  weight  of  a 
node  in  G(f)  is  the  coefficient  of  the  corresponding  term  in  /.  Figure  3  shows  the  conflict 
graph  associated  with  /  above. 

The  maximum  weight  independent  set  of  G(f)  is  {2,4};  that  is,  the  nodes  that  correspond 
to  X\Xi  and  to  X1X2X3.  The  weight  of  the  set  is  5,  the  assignment  which  achieves  the 
maximum  corresponds  to  Xi  =  1,  X2  =  1  and  X3  =  0.  Thus,  the  maximum  of  /  is  -4+5=1. 


One  can  prove  that  the  above  procedure  works  in  the  general  case;  that  is: 

1.  Every  maximum  weight  independent  set  in  G(f)  corresponds  to  a  maximum  in  /  (and 
vice  versa),  with  the  values  of  the  terms  associated  with  the  nodes  in  the  set  equal  to 
1. 

2.  In  general  the  problem  of  finding  the  maximum  weight  independent  set  in  a  graph  is 
solvable  in  polynomial  time  for  bipartite  graphs  (the  g’-aph  in  Figure  3  is  bipartite). 

3.  The  conflict  graph  associated  with  a  polynomial  is  not  unique,  because  a  term  can  be 
made  positive  by  complementing  any  odd  number  of  its  variables. 

In  the  following  we  will  show  how  decoding  techniques  can  be  used  to  maximize  0  1  nonlinear 
programs.  Consider  the  0-1  polynomials  associated  with  Hamming  codes  (see  Section  4). 
The  family  of  these  polynomials  will  be  denoted  by  HP  ( Hamming  Polynomials)  .  It  will 
be  shown,  by  an  example,  that  HP  is  not  contained  in  the  family  of  polynomials  related 
to  bipartite  conflict  graphs.  Thus,  HP  is  not  a  subset  of  the  family  of  polynomials  whose 
maximization  is  known  to  be  easy. 

Consider  the  following  polynomial  over  {0,1}: 


M  =  3-6Xl-2X2-2X3-eXi+4{X1X2+XlX3+XlXi  +  X2X3+X2X,+X3X4)~8X2X3XA 

(45) 

The  polynomial  M  is  not  associated  with  a  bipartite  conflict  graph,  as  stated  in  the  next 
proposition. 
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Proposition  7.1  There  does  not  exist  G(M),  a  conflict  graph  associated  with  M.  which  is 
bipartite. 

Proof:  The  proof  is  straightforward,  it  follows  from  checking  all  the  possible  wavs  to  convert 
the  sign  of  the  cubic  term.  c 

A  maximum  of  a  polynomial  which  belongs  to  HP  can  be  found  by  applying  the  decoding 
procedure  for  Hamming  codes.  Also,  there  is  an  efficient  method  to  recognize  if  a  given 
polynomial  is  in  HP.  We  will  describe  both  the  recognition  procedure  and  the  maximization 
procedure  by  continuing  with  the  above  example. 

Proposition  7.2  The  polynomial  A/  u  a  Hamming  polynomial. 

Proof: 

1.  Transform  M  to  an  equivalent  polynomial  over  {  —  1, 1}  by  a  change  of  variable  U  =  0.5(1  —  X) 
(as  in  Section  6). 

A/  —  —  Vi  —  U3  4-  U4  -F  U\Ui  ■+•  +  l  jf  4  4-  l  ji  3O 4  (46) 


2.  By  the  derivation  in  Section  4,  it  is  clear  that  A/  is  equivalent  to  MLD  of 

u  =  (1,1, 0.0. 0.0.0) 

with  regard  to  the  code  defined  by  the  following  generator  matrix: 


/  0 

0 

0 

1 

1 

1 

0  \ 

1 

0 

0 

1 

0 

0 

l 

0 

1 

0 

0 

1 

0 

1 

V  0 

0 

1 

0 

0 

1 

1 ) 

(47) 


3. 


Ihe  matrix  G  can  be  brought  to  a 
operations: 


/  1 
0 
0 

\  0 


systematic  form,  to  be  denoted  by  G,  by  row 


0  0  0  1  1  1  \ 

10  0  10  1 

0  10  0  11 

0  0  1  1  10/ 


(48) 


From  G  we  obtain  //.  the  systematic 


parity  check  matrix  (see  Section  5): 


/  1  1  0  1  1  0  0  \ 

//  =  1  0  1  1  0  1  0  (19) 

V 1  1  1  0  0  0  1  / 


The  polynomial  M  is  a  Hamming  polynomial  because  its  parity  check  matrix  contains 
all  the  possible  columns  (but  the  all-0  column). 
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4.  To  decode  u  we  will  use  the  syndrome,  that  is: 

uHt  =  (0,1,0) 

The  error  is  in  the  location  which  correspond  to  the  row  in  HT  that  is  equal  to  the 
syndrome.  Hence,  the  result  of  the  decoding  is:  (1, 1, 0, 0, 0, 1, 0).  The  maximum  is 
attained  at  X *  =  (1, 1, 1,0),  and  M(X* )  =  5. 

By  the  above  procedure  we  proved  that  M  is  in  HP  and  found  its  maximum.  □ 

A  few  remarks  with  regard  to  the  above  procedure: 

1.  The  procedure  in  the  above  proof  can  be  applied  to  a  general  0-1  polynomial.  Consider 
the  polynomial  representation  over  {—1,1}  (the  one  obtained  after  step  1  above),  a 
necessary  condition  that  a  polynomial  is  in  HP  is  that  the  absolute  values  of  the 
coefficients  in  the  {—1,1}  representation  are  equal  (the  constant  is  neglected). 

2.  The  complexity  of  the  recognition  process  is  determined  by  the  complexity  of  the  trans¬ 
formation  from  the  {0, 1}  representation  to  the  {1,-1}  (step  1).  This  transformation 
is  exponential  in  the  degree  of  the  polynomial  over  {0,1}. 

By  Section  4,  maximization  of  polynomials  over  {  —  1,1}  with  coefficients  in  {  —  1,1}  is  equiv¬ 
alent  to  MLD  problem  of  linear  block  codes.  The  generalization  to  polynomials  that  have 
integer  (or  rational)  coefficients  follows  immediately  by  expressing  a  term  with  a  coefficient 
being  equal  to  a  (a  positive  integer)  as  a  identical  terms  with  coefficients  equal  to  1. 

To  summarize,  we  have  established  a  technique  for  solving  0-1  nonlinear  programs  by  decod¬ 
ing  techniques.  In  particular,  for  the  family  of  Hamming  polynomials:  it  was  proven  that 
this  family  of  polynomials  is  not  a  subset  of  the  family  of  polynomials  which  are  associated 
with  bipartite  conflict  graphs,  and  both  a  recognition  procedure  and  a  solution  procedure 
were  derived. 
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Abstract 

A  Boolean  function  is  polynomial  threshold  if  it  cm  be  represented  as  a  sign  function  of  a 
polynomial  that  consists  of  a  polynomial  (in  the  number  of  variables)  number  of  terms.  The 
main  result  is  showing  that  the  class  of  polynomial  threshold  functions  (which  we  call  PT\) 
is  strictly  contained  in  the  class  of  Boolean  functions  that  can  be  computed  by  a  depth  2, 
unbounded  fan-in  polynomial  size  circuit  of  linear  threshold  gates  (which  we  call  LT?). 

We  use  harmonic  analysis  of  Boolean  functions  to  derive  a  necessary  and  sufficient  condition  for 
a  function  to  be  an  5-threshold  function  for  a  given  set  5  of  monomials.  We  use  this  condition 
to  show  that  the  number  of  different  5-threshold  functions,  for  a  given  5,  is  at  most  2nlsl. 
These  results  turn  out  to  be  a  generalization  of  known  results  for  linear  threshold  functions. 
Based  on  the  necessary  and  sufficient  condition  we  derive  a  lower  bound  on  the  number  of 
monomials  in  a  threshold  function.  The  lower  bound  is  expressed  in  terms  of  the  spectral 
representation  of  a  Boolean  function.  We  find  that  Boolean  functions  that  have  an  exponentially 
small  spectrum  are  not  polynomial  threshold.  We  exhibit  a  family  of  functions  that  has  an 
exponentially  small  spectrum;  we  call  them  ’semi-bent’  functions.  We  construct  a  function  that 
is  both  semi-bent  and  symmetric  to  prove  that  PT\  is  properly  contained  in  LT2. 

We  also  extend  the  lower  bound  technique  to  depth  2  circuits  of  linear  threshold  gates. 

‘Submitted  to  SIAM  Journal  on  Discrete  Mathematics,  1988. 
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1  Introduction 


A  Boolean  function  f(X)  is  a  threshold  function  if 


f(X)  =  sgn  (F(X))  =  { 


if  F(X)  >  0 
if  F(X)  <  0 


where 

F(X)=  £  ™«xa 

a€{0,l}n 

and 

Xa  =  ft  i?1 

t=i 

Throughout  this  paper  a  Boolean  function  will  be  defined  as  /  :  {1,— l}n  — ►  {1,-1}; 
namely,  0  and  1  are  represented  by  1  and  -1,  respectively.  It  is  also  assumed,  without  loss 
of  generality,  that  F(X)  ^  0  for  all  X. 


A  threshold  gate  is  a  gate  that  computes  a  threshold  function.  It  can  be  shown  that  any 
Boolean  function  can  be  computed  by  a  single  threshold  gate  if  we  allow  the  number  of 
monomials  in  F(X)  to  be  as  large  as  2".  Although  such  a  result  is  not  interesting  by  itself, 
it  stimulates  the  following  natural  question:  What  happens  when  the  number  of  monomials 
(terms)  in  F(X)  is  bounded  by  a  polynomial  in  n? 


The  question  can  be  formulated  by  defining  a  new  complexity  class  of  Boolean  functions.  This 
class,  called  PT\  for  Polynomial  Threshold  functions,  is  made  of  all  the  Boolean  functions 
that  can  be  computed  by  a  single  threshold  gate  in  which  the  number  of  monomials  is 
bounded  by  a  polynomial  in  n.  The  main  goal  of  this  paper  is  the  study  of  this  complexity 
class  and  its  relations  with  other  known  complexity  classes  of  Boolean  functions. 


More  precisely,  let  S  C  {0,1}";  a  Boolean  function  /  is  an  S-threshold  function  if  there 
exist  integers  that  we  call  weights  (the  wa' s)  such  that  /( X)  —  sgn(J2a^swaXa).  Hence,  a 
Boolean  function  f(X)  is  in  PTX  if  there  exists  a  set  5,  with  j  S  |  bounded  by  polynomial 
in  n,  such  that  f(X)  is  5-threshold.  Notice  that  there  is  no  restriction  on  the  size  of  the 
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weights. 

A  related  class  of  functions  is  the  class  of  linear  threshold  functions  6.  10].  A  Boolean 
function  is  linear  threshold  if  it  is  5-threshold  with  5  corresponding  to  the  constant  and 
linear  monomials.  We  define  LT\  to  be  the  class  of  functions  that  are  computable  by  a  single 
linear  threshold  gate. 

The  next  step  is  to  define  complexity  classes  which  relate  to  circuits.  Define  LTd  ( PTd )  to 
be  the  class  of  Boolean  functions  that  can  be  computed  by  an  unbounded  fan-in  polynomial 
size  circuit  of  depth  at  most  d  which  consists  of  linear  (polynomial)  threshold  gates. 

Recently,  there  has  been  a  considerable  interest  in  study  of  the  computational  model  of 
bounded  depth  unbounded  fan-in  polynomial  size  circuits  that  consist  of  linear  threshold 
gates  [5,  11,  13].  This  interest  follows  from  recent  results  in  complexity  of  circuits  [7,  12,  15] 
which  indicate  that  MAJORITY  (hence,  linear  threshold  functions)  can  not  be  computed 
by  a  bounded  depth  unbounded  fan-in  polynomial  size  circuit  that  consists  of  V,  A,  NOT 
and  PARITY  gates.  Thus,  the  next  natural  step  in  the  analysis  is  adding  MAJORITY  as  a 
possible  gate  in  the  computational  model.  Notice  that  in  the  results  in  [5]  the  weights  are 
bounded  by  a  polynomial  in  n.  To  make  the  distinction  from  the  case  in  which  the  weights 
are  not  bounded  we  put  ’hats’.  Namely,  LTd  and  PTd  correspond  to  circuits  with  bounded 
weights.  Using  this  notation,  a  related  result  in  [5]  is: 

LTi  C  LT2  C  LT3 

In  this  context,  the  study  of  circuits  of  polynomial  threshold  functions  can  be  viewed  as  study 
of  a  model  in  which  a  single  gate  is  rather  powerful.  Namely,  there  is  no  ’trivial’  function 
that  cannot  be  computed  by  a  single  gate.  For  example,  PARITY,  EXACTjt  (output  -1  iff  k 
of  its  inputs  are  -1)  and  the  characteristic  function  of  a  linear  subspace  (code)  [1,  9]  are  all 
in  PTi  but  none  of  them  is  in  LT\  (see  Appendix  A).  This  fact  suggests  that  the  separation 
between  the  classes  PT\  and  PT2  is  not  going  to  be  an  easy  problem. 

The  main  result  in  the  paper  is  a  characterization  of  the  power  of  PT\  with  respect  to  the 
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hierarchy  of  circuits  of  linear  threshold  functions.  We  have: 

LTi  C  PTi  C  LT2 

which  also  implies  that  PT\  C  PT2. 

Clearly,  LT\  C  PT\  follows  from  the  fact  that  PARITY  is  not  in  LT\.  But,  PARITY(X)  = 
sgn{x\x 2...xn),  hence  it  is  in  PT\.  Proving  that  PTi  C  LT2  is  based  on  the  observation 
that  PARITY  does  not  require  the  full  strength  of  a  depth  2  circuit  of  linear  threshold 
elements  and  is  described  in  Section  2.  In  order  to  prove  that  this  containment  is  proper  we 
developed  a  technique  for  deriving  lower  bounds  for  the  number  of  monomials  in  a  threshold 
logic  function.  This  technique  is  based  on  the  spectral  representation  of  a  Boolean  function. 
Most  of  the  paper  is  devoted  to  the  development  of  this  technique  and  its  applications. 

In  Section  3  we  review  the  subject  of  harmonic  analysis  of  Boolean  functions  [8]  and  show 
that  every  Boolean  function  has  a  representation  as  a  polynomial  over  the  rationals  and 
hence  as  a  threshold  function. 

In  Section  4  we  use  the  spectral  representation  of  Boolean  functions  and  derive  a  necessary 
and  sufficient  condition  for  a  function  to  be  an  5-threshold  function  for  a  given  5.  We  use 
this  condition  to  show  that  the  number  of  different  5-threshold  functions,  for  a  given  5,  is  at 
most  These  results  turn  out  to  be  a  generalization  of  known  results  for  linear  threshold 
functions  [2,  6,  10]. 

In  Section  5  we  use  the  necessary  and  sufficient  condition  to  derive  a  lower  bound  on  the 
number  of  monomials  in  a  threshold  function.  The  lower  bound  is  expressed  in  terms  of  the 
spectral  representation  of  a  Boolean  function.  We  find  that  Boolean  functions  that  have  an 
exponentially  small  spectrum  are  not  polynomial  threshold. 

In  Section  6  we  exhibit  a  family  of  functions  that  has  an  exponentially  small  spectrum; 
we  call  them  ’semi-bent’  functions.  We  construct  a  function  which  is  both  semi-bent  and 
symmetric  to  prove-that  P7\  is  properly  contained  in  LT2. 

In  Section  7  we  show  how  the  lower  bound  technique  can  be  extended  to  get  a  result  in  [5] 
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that  LT2  C  LT 3.  Hence,  for  bounded  weight  circuits  we  have: 

LTX  C  PT 1  C  LT2  c  pr2 
Finally,  we  address  some  open  problems. 


2  Simulation  of  Polynomial  Threshold  Functions 


It  is  a  well  known  result  that  PARITY  (as  well  as  other  symmetric  functions)  is  in  LT2 
[5,  11].  From  this  fact  it  follows  that  a  polynomial  threshold  function  can  be  simulated  by  a 
depth  3  circuit  of  linear  threshold  gates.  The  idea  is  to  compute  the  monomials  using  depth 
2  circuits  and  combine  the  monomials  in  the  gate  in  the  third  layer. 

What  we  will  show  here  is  that  depth  2  is  enough: 


Theorem  2.1 


PTx  C  LT2 


Proof:  The  idea  is  to  notice  that  PARITY  does  not  require  the  full  power  of  a  depth 
2  circuit  of  linear  threshold  gates.  Actually,  PARITY  can  be  realized  by  a  set  of  linear 
threshold  elements  in  the  first  layer  while  in  the  second  layer  we  need  only  to  sum  and  add 
a  constant  to  get  the  desired  result.  Namely,  we  do  not  have  to  use  the  threshold  operation 
in  the  second  layer. 

Example:  Let  f{X)  =  x\  ©  x2.  Let  Fi(X)  =  —  1  —  Xj  —  x2  and  F2( X)  =  —  1  +  aq  +  x2.  It 
can  be  verified  that: 

f(X)  =  1  +  sgniFxi  X))  +  sgn(F2(X)) 

Note  that  we  are  using  the  {1,-1}  representation  instead  of  {0, 1},  respectively. 

The  above  is  true  in  general  for  PARITY  of  n  variables.  In  the  general  case  we  need  up  to 
n  +  1  linear  threshold  gates  in  the  first  layer  and  again  only  summation  and  addition  of  a 
constant  in  the  second  layer.  Using  this  observation  a  polynomial  threshold  function  can  be 
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simulated  by  depth  2  linear  threshold  circuit  in  a  way  similar  to  that  done  with  depth  3.  □ 

Proving  containment  of  polynomial  threshold  functions  in  LT2  turns  out  to  be  a  very  easy 
problem  compared  to  the  problem  of  proving  that  this  containment  is  proper;  the  latter 
requires  proving  lower  bounds.  The  rest  of  the  paper  is  devoted  to  the  development  of  a 
technique  for  getting  lower  bounds  for  polynomial  threshold  functions  and  the  application 
of  this  technique  to  getting  separation  results. 


3  Polynomial  Representation  of  Boolean  Functions 


In  this  section  the  representation  of  Boolean  functions  as  polynomials  over  the  field  of  rational 
numbers  is  presented. 

Definition:  A  Hadamard  matrix  of  order  m,  to  be  denoted  by  Hm ,  is  an  m  x  m  matrix  of 
+l’s  and  -l’s  such  that: 

HmHm  =  rnlm  (1) 

where  Jm  is  the  m  x  m  identity  matrix.  The  above  definition  is  equivalent  to  saying  that 
any  two  rows  of  H  are  orthogonal. 

Hadamard  matrices  of  order  2k  exist  for  all  k  >  0.  The  so  called  Sylvester  construction  is  as 
follows  [9]: 


Hx  =  [1] 


H2 


n+1 


H2n  H2n 
H2  n  ~H2n 


(2) 


Definition:  Given  a  Boolean  function  /  of  order  n,  p  is  a  polynomial  (with  coefficients  over 
the  field  of  rational  numbers)  equivalent  to  /  iff  for  all  A”  €  {1,-1}”: 

/( X)  =  p(X) 
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As  an  example,  let  /  =  xi  ©  x2;  that  is,  /  is  the  XOR  function  of  two  variables.  It  is  easy 
to  check  that  in  the  {1,  —1}  representation  p(ii,x2)  =  xjx2. 

Notice  that  for  every  Boolean  function  /,  the  polynomial  p  is  linear  in  each  of  its  variables 
because  x2  =  1  for  x  €  {—1,1}.  It  is  known  that  every  Boolean  function  has  a  unique 
representation  as  a  polynomial  [1,8].  This  representation  is  derived  by  using  the  Hadamard 
matrix,  as  described  by  the  following  theorem. 

Theorem  3.1  Let  f  be  a  Boolean  function  of  order  n.  Let  p  be  a  polynomial  equivalent  to 
/.  Let  A2«  denote  the  vector  of  coefficients  of  p.  Let  P2«  denote  the  vector  of  the  2n  values 
of  p  (and  f).  Then: 

1.  The  polynomial  p  always  exists  and  is  unique. 

2.  The  coefficients  of  p  are  computed  as  follows, 

A 2 »  =  — /f2nP2n 

Proof:  The  proof  is  constructive.  The  idea  is  to  compute  A2n  by  solving  a  system  of  linear 
equations.  Let  us  start  by  computing  the  coefficients  of  p,  for  /  being  a  function  of  one 


variable: 


Clearly, 


and  by  (1), 


p(xi)  =  a0  +  axx  i 


p(l)  =  ao  +  ci! 
p(-l)  =  aQ-ax 


P2  —  H2A2 


a2  —  -H2P2 
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The  above  can  be  generalized  to  n  variables  by  induction  on  n.  Assume  true  for  n 

Pin  =  H 2" A2n 

For  (n+1),  consider  the  different  values  of  xn+1  and  get 

H2*  H2  n 

A2n+1 

H2»  -Hi  n 

—  f/2n+l  A2n+1 

Hadamard  matrices  are  nonsingular;  thus,  for  any  given  /  a  unique  p  (defined  by  the  vector 
of  coefficients  A2n)  always  exists.  □ 

Example:  Consider  the  function  /  of  3  variables, 

/(ll,l2,l3)  =  (llAl3)V(liAl2)  (5) 

By  Theorem  3.1, 

p(x1,x2,x 3)  =  ^(2  -f  6xi  +  2x2  -  2x!X2  4-  2x3  -  2x^3  +  2x2x3  -  2xix2x3)  (6) 

O 

The  entries  of  the  vector  A  are  denoted  by  {aa  |  a  6  {0, 1}"}  and  called  the  spectral  repre¬ 
sentation  of  a  function.  Note  that  aa  is  the  coefficient  of  X°  in  the  polynomial  representation 
where  Xa  =  x°lx%3 . . .  x“”. 

The  above  method  is  applicable  not  only  to  Boolean  functions  but  also  to  any  function  of 
the  form  /  :  {1,  — l}n  — ♦  9£. 

4  Necessary  and  Sufficient  Conditions 

We  use  the  polynomial  representation  of  Boolean  functions  that  was  developed  in  the  previ¬ 
ous  section  to  derive  a  necessary  and  sufficient  condition  for  a  function  to  be  an  5-threshold 
function,  for  arbitrary  5.  This  result  turned  out  to  be  a  generalization  of  a  known  result  for 
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linear  threshold  functions  [2,  6,  10]. 

Let  f(X)  =  sgn(F(X))  be  a  threshold  function.  Without  loss  of  generality,  assume  that 
F( X)  ^  0  foi  ail  X  €  {1,-1}”.  The  following  simple  lemma  enables  us  to  express  the 
relation  between  f{X)  and  F(X)  in  a  global  way.  That  is,  instead  of  having  2n  conditions 
we  have  only  one: 


Lemma  4.1  Let  f(X)  be  a  Boolean  function  and  let  F{X)  ^  0  for  all  X  G  {1,— 1}”,  then 


f(X)  =  sgnlFIX)) 

vxe{i,-i}“ 

(7) 

E  1  F(x) i  = 

E  nx)F(x) 

(8) 

Proof:  Suppose  there  exists  X\  that  violates  (7);  that  implies  (F(A')  ^  0  for  all  A')  that 

IW  |> /(*i)F(*i) 

Hence,  (8)  is  also  not  true  because  violation  in  equation  (7)  can  only  decrease  the  value  on 
the  right  hand  side  of  (8).  Clearly,  if  (7)  is  true  so  is  (8).  □ 

Lemma  4.2 

[  2"  if  a  =  all-0  vector 

E  '  0) 

X€{i,-i}"  (  0  else 

Proof:  Follows  from  the  fact  that  Xa  corresponds  to  a  row  of  a  Sylvester  type  Hadamard 
matrix  (see  Theorem  3.1).  □ 

The  necessary  and  sufficient  condition  follows  from  (8)  by  using  the  polynomial  representa¬ 
tion  of  a  Boolean  function. 

Theorem  4.1  Fix  S  C  {0,1}".  Let  F(X)  =  ^Z,eSwaXQ.  Let  f(X),  X  G  {-1,1}",  be  a 
Boolean  function  with  spectral  representation  {a0  |  a  G  {0,1}"}.  Then: 

f(X)  =  sgn(F(X))  VAg{1,-1}" 
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E  I  F(x)  1= 2"  E  <■>.«. 

X6{1,— l}n  a€S 

Proof:  By  the  Lemma  4.1  it  is  enough  to  show  that 

E  !{X)F{X)  =  2"  Y. 

AT€{1,-1>"  a6.« 

We  write  /(X)  as  a  polynomial: 

f(x)=  E  «-v” 

o€{0,l  >•» 

and  get  that  the  constant  term  of  f(X)F( X)  is  Yioes  u'.u,  Hence,  the  result  follows  from 
Lemma  4.2.  Z 

Theorem  4.1  is  interesting  because  it  suggests  tha*  an  S-threshold  function  is  fully  charar- 
terized  by  the  set  of  spectral  coefficients  that  correspond  to  5. 

Theorem  4.2  Let  fx  and  fa  be  S -threshold  functions  with  {a^  |  a  €  '0.  1  1  t  -  1.2. 

be  their  spectral  representation,  respectively  Then  f\(X)  =  /2(. VI  for  ail  \  -  {!.  -1}"  iff 
a*  =  a*  for  all  a  €  S. 

Proof:  Suppose  f\  —  /2.  By  the  uniqueness  of  the  spectral  representation  Theorem  3.1)  we 
get  the  ’only  if’.  Now  suppose  a\  =  a*  for  all  a  6  S.  By  Theorem  ;  !  and  the  assumption 
that  both  /i  and  /2  are  5-th.esho!d  we  get  that  there  exist s  a  *et  of  weights  that  satisfies 
(10)  for  both  and  /2.  Hence.  Fp  A  )  =  F2( X)  for  all  V  <=  "1.  -i}n.  □ 

Corollary  4.1  Consider  the  >ft  S  C  {0,  l}n.  f.tt  fx  and  f:  he  Boolean  functions  of  n 
variables,  ff  aly  =  for  all  <>  g  F.  then  either  both  and  f\  are  S -threshold  or  both  are 
not  S-threshold. 

One  application  of  th<  above  is  counting  the  number  of  different  F-thrcsliold  functions. 
Theorem  4.3  hi  >  C  {0.  l}n.  There  nr/  at  me.-/  different  S-threshold  functions. 


Proof:  It  can  be  shown  that  for  all  a  €  {0, 1},  aa  can  assume  at  most  2n  different  values. 
Hence,  there  are  at  most  2”^  different  sets  of  |  S  |  spectral  coefficients.  Thus,  the  result 
follows  from  Corollary  4.1  .  □ 


The  above  turned  out  to  be  a  generalization  of  a  known  [6,  10]  upper  bound  on  the  number 
of  linear  threshold  functions  for  which  |  S  |=  n  +  1. 

5  Lower  Bounds 

The  necessary  and  sufficient  condition  that  is  derived  above  is  used  to  derive  lower  bounds  on 
the  number  of  monomials  in  a  threshold  function,  again,  by  using  the  spectral  representation. 
Let  f(X)  =  sgn(F(X ))  be  an  5-threshold  function,  namely, 

F(X)  =  £  waXa 

ot£S 

We  want  to  find  lower  bounds  for  |5|  as  a  function  of  the  spectral  representation  of  f(X). 
Lemma  5.1  For  all  a  £  S: 


2"  i  «u<  E  i  Fw  i 
*€{1,-1}" 


Proof:  First  we  prove  the  statement  for  a  being  the  all-0  vector: 


E  I  F(X)  I 
*€{1,-1}" 


=  E  n*)-  E  w 

F(X)>0  F(X)<0 

=  E  nx)-2  E  r(.v) 

*€{1.-1}"  F(X)<0 

==  2nii!0o...oo —  2  ^2  f'X-M 

F(X)<0 

>  2ninoo...oo 
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Note  that  (a)  follows  from  Lemma  4.2.  The  proof  for  arbitrary  a  follows  from  the  fact  that 
|  Xa  |=  1,  hence: 

|  F(X)  |=|  Xa  ||  F(X)  |  =  |  XaF(X)  | 

Hence,  we  can  make  any  wa  be  the  constant  term  without  changing  the  value  of  |  F( A)  |.  If 
wa  <  0  we  take  —  F(X)  and  get  the  result.  a 

Theorem  5.1  Let  f(X)  =  sgn(YlazS  waXa)  be  an  S -threshold  function  and  let  {aj  a  6 
{0,1}"}  be  its  spectral  representation;  then 

I  S  I  >  (^Palaol)"1 

or  €S 


where 


and 


-  I  I 

Eo€S  I  W°  I 

a  =  max  I  aa  j 
<>es  1 


Proof:  We  first  prove  (i).  By  Theorem  4.1  and  Lemma  5.1,  for  all  a  €  S : 

ot£S 

We  sum  the  above  inequality  over  all  a  6  S  and  get: 

El^i  <  l s i  Yl tc«a» 

Or  €5  OgS 

<  I  S  I  I  w<*  II  a“  I 

ot£S 

So  we  get  (i).  For  (ii),  just  notice  that  pa  >  0  and  EPo  =  1-  1=1 

We  summarize  this  section  by: 

Corollary  5.1  Fix-any  e  >  0.  Let  f(X)  be  a  Boolean  function  of  n  variables.  If\aa\  <  2~'n 
for  all  a  6  {0,1}",  then,  forn  sufficiently  large,  f{X)  is  not  a  polynomial  threshold  function. 
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6  Separating  by  Semi-Bent  Functions 


We  use  Corollary  5.1  to  get  separation  results  by  looking  at  functions  that  have  an  exponen¬ 
tially  small  spectrum. 

Definition:  A  Boolean  function  f{X)  is  called  ’bent’  [3,  9,  14]  iff  |  aa  |=  2_t  for  all 
a  6  {0,  l}n.  Notice  that  bent  functions  are  defined  for  even  n  only. 


Proposition  6.1  The  Inner  Product  Mode  2  (IP2)  function,  i.e., 

f(X)  =  ( h  A  x2)  ©  (*3  A  x4)  ©  . . .  ©  (x2„-i  A  x2„) 

is  a  bent  function. 

Proof:  See  [9].  A  sketch  of  an  alternative  proof:  it  can  be  proven  by  induction  on  n  that 
IP2  when  written  as  a  vector  is  actually  an  eigenvector  (with  eigenvalue  =  2n)  of  a  Sylvester 
type  Hadamard  matrix  of  order  22n.  Hence,  |a0|  =  2~n  for  all  a  6  {0, 1}".  □ 


Theorem  6.1 


PT,  C  PT2 

Proof:  By  <  Vrol'arv  5.1  and  Proposition  <' 


PI 2  \ . '  i  i  s  are  compf,-,)  in  ‘.he 


t  he  seco.vi  '.i  ver. 
Definition:  Let  <  >  0  1 

<i  1  j*.  i«  1 


'"V-m  d’ld  the  XGR  is  com 


*  Boolean  function  /^.Y  i  is  railed  'semi-1 
^function  is  also  a  semi-bent  function. 


Proof:  The  fact  that  PT\  C  LT2  is  proved  in  Theorem  2.1.  To  show  that  it  is  a  proper 

containment  we  must  find  a  function  which  is  in  LT2  but  not  in  PT\.  Every  symmetric 

function  is  in  LT2  [5,  11]  and  every  semi-bent  function  is  not  in  PT\  (Corollary  5.1).  Hence, 
a  natural  candidate  for  such  a  function  will  be  a  symmetric  semi-bent  function.  Indeed,  a 
symmetric  semi-bent  exists  for  all  n  as  stated  in  Proposition  6.2  below.  □ 

Consider  the  function: 

f(X)  =  (xi  A  x2)  ©  (xi  A  x3)  ©  . . .  ©  (xn_i  A  xn)  (11) 

Hence,  f(X)  consists  of  the  sum  mod  2  of  all  the  possible  AND’s  between  pairs  of 

variables.  We  call  this  function  the  Complete  Quadratic  (CQ)  function.  Clearly,  CQ  is  a 
symmetric  function. 

Proposition  6.2  CQ  is  bent  for  n  even,  and  semi-bent  for  n  odd. 

Proof:  Actually  we  can  compute  the  exact  spectral  representation  of  CQ  for  every  n,  see 
Appendix  B. 

7  Lower  Bounds  for  Circuits 

Here  we  use  the  necessary  and  sufficient  condition  to  derive  lower  bounds  for  the  number  of 
gates  in  depth  2  circuits  of  linear  threshold  gates. 

A  depth  2  circuit  consists  of  a  single  gate  that  we  call  t  (its  output  is  the  output  of  the 
circuit)  together  with  a  set  of  k  gates  whose  outputs  are  inputs  to  t.  The  i’th  gate  in  the 
first  level  of  the  circuit  is  denoted  by  c,.  Hence,  the  function  computed  by  the  circuit  is 
t(X)  =  <(c1(.Y),c2(A'),...,C(:(.Y));  where  t  and  c,  for  all  i  e  {!..£}  are  linear  threshold 
gates. 

We  use  the  same  ideas  as  in  Section  4  and  get: 

Theorem  7.1  Let 

k 

T(X)  =  w0  +  WiCi(X) 


* 


I 
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and  assume  T(X)  ^  0  for  all  X  G  { 1,  —  1  }”•  Let  t(X)  =  /'  A 

representation  of  t(Af)  is  {aa  |  a  G  {0,1}"}  and  the  spectral  vcy-  -  •.  s' 
{b'a  j  a  G  {0,1}"}.  Then 

E  I  1=  2"  f  “Woo  00  -r  E  u,  E  11  - V  ) 

X€{l,-l}n  \  >=1  at;o.lt" 


-pft  trni 

\  ■  1 5 


•  12) 


Proof:  The  proof  is  similar  to  the  proof  of  Theorem  4.1. 


□ 


We  show  that  the  result  in  [5]  tha*  IP2  LT2  can  be  derived  by  using  the  fact  that  the 
left  hand  side  of  (12)  is  greater  or  equal  2n,  hence: 

Proposition  7.1  Let  t(.Y)  =  ffci(A'),  c;(  X), . . .  ,c*(.Y)),  then: 

,  ^  1  —  U>0<JOO...OO 

k  d - — - 

wc 


where 


and 


w  -  max  I  xv,  I 


c  —  max 
«€!< 


I  E  a^l 

o€  {0,1 }" 


a a  and  ba  have  the  seme  mraniuy  as  in  Theorem  7.1. 


Now  W  t(X)  ■=  I P2.  !fu  fact  that  I F2  #  l.  T2  follows  from: 

.  t b  -3  bounded  by  a  nolynomial  in  n. 

2.  dyij  of,  for  IP‘2  is  expom  nti-dly  small. 

3.  Using  the  lemma  by  Lindsey  [t.  p.  88;  ibout  Hadamard  matrices  it  can  be  shown  that 
c  ’s  exponentially  small 

The  above  technique  s  effective  lor  getting  lower  bounds  on  k  whenever  M  A  i  and  the 


c,  s  va  Unth  of  gates/circuits  i  result  in  ~  which  is  exponent iallv  small. 


8  Open  Problems 

The  main  open  problem  is  to  find  the  exact  relation  between  the  class  PT°  =  UdeA/-  PTd  and 
the  well  known  class  NC1.  Here,  we  were  concentrating  on  the  relation  between  subclasses 
of  PT°  and  subclasses  of  LT°,  with  the  goal  of  getting  separation  in  LT°.  In  particular  we 
have  the  following  conjecture: 

Conjecture:  Foi  all  d  e  -V  :  LTd  C  PTd  C  LT2d 
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Appendices 


A  Examples 


Here  we  give  some  examples  of  polynomial  threshold  functions  which  are  not  linear  threshold 
functions.  To  show  that  a  function  is  polynomial  threshold  we  need  to  give  the  explicit  .F(.Y), 
while  proving  that  a  function  is  not  linear  threshold  requires  the  application  of  the  necessary 
and  sufficient  condition  developed  in  the  paper.  We  will  work  with  Boolean  functions  of  n 
variables  but  the  reader  should  think  of  n  being  arbitrary. 


We  start  with  a  trivial  example  of  the  well  known  PARITY  function. 


Definition: 


PARITY(X')  =  { 


-1 

1 


odd  number  of  —  l’s  in  X 
otherwise 


Clearly,  PARITY(AT)  =  x\x*i  ■  •  •  xn,  so  we  even  do  not  need  the  threshold  to  compute  PAR¬ 
ITY.  To  show  that  PARITY  is  not  in  LT\  just  notice  that  the  spectral  coefficients  which 
correspond  to  the  constant  and  linear  terms  are  all  0.  So  there  is  no  F(A"),  such  that 
F(X)  yf  0  for  all  X  G  {1,-1}".  that  will  satisfy  the  necessary  and  sufficient  condition, 
because  the  right  hand  side  of  equation  (10)  is  0. 


The  second  example  is  a  bit  more  complicated. 

Definition:  Define  the  following  function  over  n  variables, 


EXACT*  (A')  =  | 


-1 

1 


if  the  number  of  —l’s  in  A'  is  k 
otherwise 


Proposition  A.l 


EXACTS  €  PTX 


Proof:  Consider  the  Boolean  function  of  2 n  variables  that  is  defined  as  follows:  It  is  -1 
iff  the  number  of  -l’s  in  .V  is  equal  to  the  number  of  l’s  in  X.  Clearly,  this  function  is 
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EXACT2n(AT).  We  show  that  this  function  is  in  PT\  and  the  proof  follows  by  reducing 
EXACT?  to  EXACT*". 

Let 


F( X)  =  (n  —  1)  4-  X1X2  -f  X1X3  +  •  •  •  +  a^n-i^n 

Namely,  F(X)  consists  of  a  constant  term  (n-1)  and  all  the  (22n)  monomials  of  two  variables. 
We  will  show  that  EXACT2n(J%:)  =  sgn(F(X)). 

Suppose  X  consists  of  (n  +  m)  -l’s  and  (n  —  m)  l’s.  Notice  that  we  want  F(X)  <  0  iff 
m  =  0.  We  calculate  the  value  of  F(X)  as  a  function  of  m.  We  look  at  the  value  of  the 
terms  (the  constant  term  is  excluded)  and  get  that  the  number  of  terms  in  F( X)  that  are 
-1  is  exactly 

n2-m2  =  (n  +  m)(n  —  m) 

and  the  rest  of  the  terms  are  1.  Hence,  for  X  having  m  -l’s, 

F(X)  =  (n  -  1)  -I-  (22n)  -  2(n2  -  m2) 

=  2m2  -  1 

Clearly,  EXACT2n(X)  =  sgn(F(X). 

Notice  that  EXACT?(X")  =  EXACT2n(A'',  K),  where  Y  is  a  vector  of.length  n  that  consists 
of  k  l’s  and  ( n  —  k )  -l’s,  e.g. 

k  (n-k) 

y  =  (TT^TT-i  -i~.~  1) 


That  is,  EXACT?(X)  is  in  PTX. 


□ 


Now  we  show  that  EXACT?  is  not  linear  threshold. 

Proposition  A. 2 

EXACT?  $  LTX 

Proof:  We  will  show  that  the  function  EXACT271  is  not  in  LT\.  It  can  be  shown  that  the 
spectral  coefficients  of  EXACT2"  that  correspond  to  the  linear  terms  are  all  0  and  the  one 
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that  corresponds  to  the  constant  term  is 


a  oo. ..oo 


(?) 


Assume  the  function  EXACTS”  is  in  LT\,  use  Theorem  4.1  and  Lemma  5.1  and  get  that 

2"  |  Woo... oo  |<  2n  |  u>oo...oo  I!  aoo...oo  | 

Hence,  we  get  that  |  Uoo...oo  !>  1  which  is  a  contradiction. 

The  third  example  is  related  to  linear  codes  [9]. 

Definition  Let  C  be  a  linear  [n,  &]  block  code.  Then  the  characteristic  function  of  C  is 


Ic(X)  = 


-i  if xec 

1  otherwise 


Here  we  only  state  the  result  without  a  proof.  The  idea  in  the  proof  is  to  use  the  represen¬ 
tation  of  linear  block  code  that  was  developed  in  [1]. 

Proposition  A. 3  Let  C  be  a  linear  block  code;  then  Ic  €  PT\  and  Ic  $.  LT\. 

Example:  See  [1,  9].  Let  C  be  the  [7, 4]  Hamming  code.  The  parity  check  matrix  of  C  is 

1  1  0 
1  0  1 
0  1  1 
HT=  111 
1  0  0 
0  1  0 
0  0  1 


Ic(X)  =  sgn(  3  —  —  11X3X4X6  —  X2X3X4X7) 

In  general,  for  an  [n,  fc]  code  we  need  only  (n  —  k  +  1)  terms  in  F(X)  that  are  easily 
calculated  from  the  parity  check  matrix  of  the  code. 
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B  The  Spectrum  of  the  Complete  Quadratic  Function 


The  Complete  Quadratic  function  is  defined  4a  Section  6.  Here  we  prove  Proposition  6.2. 
We  start  by  giving  an  equivalent  definition  of  the  function  CQ, 


Proposition  B.l 


CQ(X)  =  { 


no.  of  -1  ’s  in  X  —  0  or  l  mod  f 
otherwise 


Proof:  Suppose  there  are  m  -l’s  in  X.  Since  a  pair  in  equation  (11)  is  -1  iff  both  variables 
are  -1,  we  have  exactly  pairs  which  are  -1.  Hence,  the  value  of  CQ(X)  is  determined  by 
the  evenness  of  and  the  result  follows.  □ 

First  we  calculate  the  spectrum  for  the  case  when  n  is  even. 


Proposition  B.2  Let  {aa  j  a  €  {0,  l}n}  be  the  spectral  representation  of  CQ(X).  Assume 
that  n  is  even,  then 

K  1=2-*,  Va  €E  {0, 1}” 


Proof:  The  proof  is  by  induction  on  n.  For  n  =  2  we  have 

CQ(x 2,x2)  =  ^(1  4-  Xi  +  x2  -  xxx2) 


Assume  true  for  n  and  show  that  the  statement  is  true  for  (n  +  2).  We  use  the  same  notation 
as  in  Section  3,  namely,  P2n  represents  the  vector  with  the  values  of  CQ  and  A2»  represents 
the  vector  of  the  spectral  coefficients  of  CQ.  Using  Proposition  B.l  it  can  be  shown  that 
P2n+3  can  be  expressed  as  a  function  of  P2n: 


P2n+2 


P2n 


—  P2n 
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where 


P2n  —  X2 n  0P2n 

X2n  is  the  vector  representaion  of  PARITY(Y)  =  I1X2  •  •  •  xn  and  ’o’  is  bitwise  multiplication. 
Hence,  by  Theorem  3.1 

A2n+2  =  2^$H2n+2P2n+2 


H2n  H2n  H2n  H2n 

P2n 

1 

c 

1 

1! 

B 

1 

65 

P2n 

2n+2 

H2n  H2n  — H2n  — H2n 

P2n 

H2n  — H2n  — H2n  H2n 

— '  P 2n 

A2n 

1  A2n 

^  A2n 

A 

—  A2n 

where  A2n  is  the  reflection  of  A2n.  Hence,  if  the  result  is  true  for  n  it  is  also  true  for  n  -f  2. 

□ 


Example:  Using  the  above  recursive  description  of  the  spectrum  of  CQ(X )  we  can  calculate 
Aie  from  A*\ 

^=1(1,1, 1,-1) 


And 


=  -(-1,1,1,1,  1,1, 1,-1,  1,1, 1,-1,  1,-1, -1,-1) 


The  above  is  true  for  n  even.  For  n  odd  we  have, 

Proposition  B.3  Let  {a„  |  a  6  {0,  l}n}  be  the  spectral  representation  of  CQ(X).  Assume 
that  n  is  odd  then  - 

\aa  |=0  or  2'^  Va  G  {0,1}" 
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The  proof  is  similar  to  the  even  case.  We  use  induction  on  n  and  can  write  the  recursive 
description  of  the  spectrum. 

Example:  Let  n  =  3  then 

CQ(x l,x3,x3)  =  i(xx  +  X2  +  x3  -  X1X2^3). 
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